-
Scale invariant bounds for the Kelvin-Helmholtz instablity
Authors:
Konstantin Kalinin,
Govind Menon,
Bian Wu
Abstract:
We derive robust long-time a-priori estimates for the Navier-Stokes equation in a two-dimensional infinite strip which are uniform in the Reynolds number. These estimates provide several new scale invariant upper bounds for the size of the mixing layer in the Kelvin-Helmholtz instability.
We derive robust long-time a-priori estimates for the Navier-Stokes equation in a two-dimensional infinite strip which are uniform in the Reynolds number. These estimates provide several new scale invariant upper bounds for the size of the mixing layer in the Kelvin-Helmholtz instability.
△ Less
Submitted 18 October, 2024;
originally announced October 2024.
-
Environment Scan of Generative AI Infrastructure for Clinical and Translational Science
Authors:
Betina Idnay,
Zihan Xu,
William G. Adams,
Mohammad Adibuzzaman,
Nicholas R. Anderson,
Neil Bahroos,
Douglas S. Bell,
Cody Bumgardner,
Thomas Campion,
Mario Castro,
James J. Cimino,
I. Glenn Cohen,
David Dorr,
Peter L Elkin,
Jungwei W. Fan,
Todd Ferris,
David J. Foran,
David Hanauer,
Mike Hogarth,
Kun Huang,
Jayashree Kalpathy-Cramer,
Manoj Kandpal,
Niranjan S. Karnik,
Avnish Katoch,
Albert M. Lai
, et al. (32 additional authors not shown)
Abstract:
This study reports a comprehensive environmental scan of the generative AI (GenAI) infrastructure in the national network for clinical and translational science across 36 institutions supported by the Clinical and Translational Science Award (CTSA) Program led by the National Center for Advancing Translational Sciences (NCATS) of the National Institutes of Health (NIH) at the United States. With t…
▽ More
This study reports a comprehensive environmental scan of the generative AI (GenAI) infrastructure in the national network for clinical and translational science across 36 institutions supported by the Clinical and Translational Science Award (CTSA) Program led by the National Center for Advancing Translational Sciences (NCATS) of the National Institutes of Health (NIH) at the United States. With the rapid advancement of GenAI technologies, including large language models (LLMs), healthcare institutions face unprecedented opportunities and challenges. This research explores the current status of GenAI integration, focusing on stakeholder roles, governance structures, and ethical considerations by administering a survey among leaders of health institutions (i.e., representing academic medical centers and health systems) to assess the institutional readiness and approach towards GenAI adoption. Key findings indicate a diverse range of institutional strategies, with most organizations in the experimental phase of GenAI deployment. The study highlights significant variations in governance models, with a strong preference for centralized decision-making but notable gaps in workforce training and ethical oversight. Moreover, the results underscore the need for a more coordinated approach to GenAI governance, emphasizing collaboration among senior leaders, clinicians, information technology staff, and researchers. Our analysis also reveals concerns regarding GenAI bias, data security, and stakeholder trust, which must be addressed to ensure the ethical and effective implementation of GenAI technologies. This study offers valuable insights into the challenges and opportunities of GenAI integration in healthcare, providing a roadmap for institutions aiming to leverage GenAI for improved quality of care and operational efficiency.
△ Less
Submitted 27 September, 2024;
originally announced October 2024.
-
The Best of Both Worlds: On the Dilemma of Out-of-distribution Detection
Authors:
Qingyang Zhang,
Qiuxuan Feng,
Joey Tianyi Zhou,
Yatao Bian,
Qinghua Hu,
Changqing Zhang
Abstract:
Out-of-distribution (OOD) detection is essential for model trustworthiness which aims to sensitively identify semantic OOD samples and robustly generalize for covariate-shifted OOD samples. However, we discover that the superior OOD detection performance of state-of-the-art methods is achieved by secretly sacrificing the OOD generalization ability. Specifically, the classification accuracy of thes…
▽ More
Out-of-distribution (OOD) detection is essential for model trustworthiness which aims to sensitively identify semantic OOD samples and robustly generalize for covariate-shifted OOD samples. However, we discover that the superior OOD detection performance of state-of-the-art methods is achieved by secretly sacrificing the OOD generalization ability. Specifically, the classification accuracy of these models could deteriorate dramatically when they encounter even minor noise. This phenomenon contradicts the goal of model trustworthiness and severely restricts their applicability in real-world scenarios. What is the hidden reason behind such a limitation? In this work, we theoretically demystify the ``\textit{sensitive-robust}'' dilemma that lies in many existing OOD detection methods. Consequently, a theory-inspired algorithm is induced to overcome such a dilemma. By decoupling the uncertainty learning objective from a Bayesian perspective, the conflict between OOD detection and OOD generalization is naturally harmonized and a dual-optimal performance could be expected. Empirical studies show that our method achieves superior performance on standard benchmarks. To our best knowledge, this work is the first principled OOD detection method that achieves state-of-the-art OOD detection performance without compromising OOD generalization ability. Our code is available at \href{https://github.com/QingyangZhang/DUL}{https://github.com/QingyangZhang/DUL}.
△ Less
Submitted 12 October, 2024;
originally announced October 2024.
-
COME: Test-time adaption by Conservatively Minimizing Entropy
Authors:
Qingyang Zhang,
Yatao Bian,
Xinke Kong,
Peilin Zhao,
Changqing Zhang
Abstract:
Machine learning models must continuously self-adjust themselves for novel data distribution in the open world. As the predominant principle, entropy minimization (EM) has been proven to be a simple yet effective cornerstone in existing test-time adaption (TTA) methods. While unfortunately its fatal limitation (i.e., overconfidence) tends to result in model collapse. For this issue, we propose to…
▽ More
Machine learning models must continuously self-adjust themselves for novel data distribution in the open world. As the predominant principle, entropy minimization (EM) has been proven to be a simple yet effective cornerstone in existing test-time adaption (TTA) methods. While unfortunately its fatal limitation (i.e., overconfidence) tends to result in model collapse. For this issue, we propose to Conservatively Minimize the Entropy (COME), which is a simple drop-in replacement of traditional EM to elegantly address the limitation. In essence, COME explicitly models the uncertainty by characterizing a Dirichlet prior distribution over model predictions during TTA. By doing so, COME naturally regularizes the model to favor conservative confidence on unreliable samples. Theoretically, we provide a preliminary analysis to reveal the ability of COME in enhancing the optimization stability by introducing a data-adaptive lower bound on the entropy. Empirically, our method achieves state-of-the-art performance on commonly used benchmarks, showing significant improvements in terms of classification accuracy and uncertainty estimation under various settings including standard, life-long and open-world TTA, i.e., up to $34.5\%$ improvement on accuracy and $15.1\%$ on false positive rate.
△ Less
Submitted 12 October, 2024;
originally announced October 2024.
-
LISAC: Learned Coded Waveform Design for ISAC with OFDM
Authors:
Chenghong Bian,
Yumeng Zhang,
Deniz Gunduz
Abstract:
We propose a novel deep learning based method to design a coded waveform for integrated sensing and communication (ISAC) system based on orthogonal frequency-division multiplexing (OFDM). Our ultimate goal is to design a coded waveform, which is capable of providing satisfactory sensing performance of the target while maintaining high communication quality measured in terms of the bit error rate (…
▽ More
We propose a novel deep learning based method to design a coded waveform for integrated sensing and communication (ISAC) system based on orthogonal frequency-division multiplexing (OFDM). Our ultimate goal is to design a coded waveform, which is capable of providing satisfactory sensing performance of the target while maintaining high communication quality measured in terms of the bit error rate (BER). The proposed LISAC provides an improved waveform design with the assistance of deep neural networks for the encoding and decoding of the information bits. In particular, the transmitter, parameterized by a recurrent neural network (RNN), encodes the input bit sequence into the transmitted waveform for both sensing and communications. The receiver employs a RNN-based decoder to decode the information bits while the transmitter senses the target via maximum likelihood detection. We optimize the system considering both the communication and sensing performance. Simulation results show that the proposed LISAC waveform achieves a better trade-off curve compared to existing alternatives.
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
-
Measurement of the double-differential cross section of muon-neutrino charged-current interactions with low hadronic energy in the NOvA Near Detector
Authors:
M. A. Acero,
B. Acharya,
P. Adamson,
L. Aliaga,
N. Anfimov,
A. Antoshkin,
E. Arrieta-Diaz,
L. Asquith,
A. Aurisano,
A. Back,
N. Balashov,
P. Baldi,
B. A. Bambah,
E. Bannister,
A. Barros,
S. Bashar,
A. Bat,
K. Bays,
R. Bernstein,
T. J. C. Bezerra,
V. Bhatnagar,
D. Bhattarai,
B. Bhuyan,
J. Bian,
A. C. Booth
, et al. (183 additional authors not shown)
Abstract:
The NOvA collaboration reports cross-section measurements for $ν_μ$ charged-current interactions with low hadronic energy (maximum kinetic energy of 250 MeV for protons and 175 MeV for pions) in the NOvA Near Detector. The results are presented as a double-differential cross section as a function of the direct observables of the final-state muon kinematics. Results are also presented as a single-d…
▽ More
The NOvA collaboration reports cross-section measurements for $ν_μ$ charged-current interactions with low hadronic energy (maximum kinetic energy of 250 MeV for protons and 175 MeV for pions) in the NOvA Near Detector. The results are presented as a double-differential cross section as a function of the direct observables of the final-state muon kinematics. Results are also presented as a single-differential cross section as a function of the derived square of the four-momentum transfer, $Q^{2}$, and as a function of the derived neutrino energy. The data correspond to an accumulated 8.09$\times10^{20}$ protons-on-target (POT) in the neutrino mode of the NuMI beam, with a narrow band of neutrino energies peaked at 1.8 GeV. The analysis provides a sample of neutrino-nucleus interactions with an enhanced fraction of quasi-elastic and two-particle-two-hole (2p2h) interactions. This enhancement allows quantitative comparisons with various nuclear models. We find strong disagreement between data and theory-based models in various regions of the muon kinematic phase space, especially in the forward muon direction.
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
-
Angular Power Spectrum of TeV-PeV Cosmic Ray Anisotropies
Authors:
Wenyi Bian,
Gwenael Giacinti,
Brian Reville
Abstract:
Simulations of the cosmic-ray (CR) anisotropy down to TeV energies are presented, using turbulence parameters consistent with those inferred from observations of the interstellar medium. We compute the angular power spectra $C_{\ell}$ of the CR anisotropy obtained from the simulations. We demonstrate that the amplitude of the large scale gradient in the CR density profile affects only the overall…
▽ More
Simulations of the cosmic-ray (CR) anisotropy down to TeV energies are presented, using turbulence parameters consistent with those inferred from observations of the interstellar medium. We compute the angular power spectra $C_{\ell}$ of the CR anisotropy obtained from the simulations. We demonstrate that the amplitude of the large scale gradient in the CR density profile affects only the overall normalisation of the $C_{\ell}$s, without affecting the shape of the angular power spectrum. We show that the power spectrum depends on CR energy, and that it is sensitive to the location of the observer at small $\ell$. It is found to flatten at large $\ell$, and can be modelled by a broken power-law, exhibiting a break at $\ell \approx 4$. Our computed power spectrum at $\sim 10\,$TeV fits well HAWC and IceCube measurements. Moreover, we calculate all coefficients of the spherical harmonics and compute the component of the angular power spectrum projected onto the direction of the local magnetic field line. We find that deviations from gyrotropy become increasingly important at higher CR energies and larger values of $\ell$.
△ Less
Submitted 12 October, 2024;
originally announced October 2024.
-
A New Approach of Data-driven Simulation and Its Application to Solar Active Region 12673
Authors:
Zhi-Peng Liu,
Chao-Wei Jiang,
Xin-Kai Bian,
Qing-Jun Liu,
Peng Zou,
Xue-Shang Feng
Abstract:
The solar coronal magnetic field is a pivotal element in the study of eruptive phenomena, and understanding its dynamic evolution has long been a focal point in solar physics. Numerical models, driven directly by observation data, serve as indispensable tools in investigating the dynamics of the coronal magnetic field. This paper presents a new approach to electric field inversion, which involves…
▽ More
The solar coronal magnetic field is a pivotal element in the study of eruptive phenomena, and understanding its dynamic evolution has long been a focal point in solar physics. Numerical models, driven directly by observation data, serve as indispensable tools in investigating the dynamics of the coronal magnetic field. This paper presents a new approach to electric field inversion, which involves modifying the electric field derived from the DAVE4VM velocity field using ideal Ohm's law. The time series of the modified electric field is used as a boundary condition to drive a MHD model, which is applied to simulate the magnetic field evolution of active region 12673. The simulation results demonstrate that our method enhances the magnetic energy injection through the bottom boundary, as compared with energy injection calculated directly from the DAVE4VM code, and reproduce of the evolution of the photospheric magnetic flux. The coronal magnetic field structure is also in morphological similarity to the coronal loops. This new approach will be applied to the high-accuracy simulation of eruption phenomena and provide more details on the dynamical evolution of the coronal magnetic field.
△ Less
Submitted 12 October, 2024;
originally announced October 2024.
-
Constraints on Covariant Horava-Lifshitz Gravity from precision measurement of planetary gravitomagnetic field
Authors:
Li-dong Zhang,
Li-Fang Li,
Peng Xu,
Xing Bian,
Ziren Luo
Abstract:
As a generalization of Einstein's theory, Horava-Lifshitz has attracted significant interests due to its healthy ultraviolet behavior. In this paper, we analyze the impact of the Horava-Lifshitz corrections on the gravitomagnetic field. We propose a new planetary gravitomagnetic field measurement method with the help of the space-based laser interferometry, which is further used to constrain the H…
▽ More
As a generalization of Einstein's theory, Horava-Lifshitz has attracted significant interests due to its healthy ultraviolet behavior. In this paper, we analyze the impact of the Horava-Lifshitz corrections on the gravitomagnetic field. We propose a new planetary gravitomagnetic field measurement method with the help of the space-based laser interferometry, which is further used to constrain the Horava-Lifshitz parameters. Our analysis shows that the high-precision laser gradiometers can indeed limit the parameters in Horava-Lifshitz gravity and improve the results by one or two orders when compared with the existing theories. Our novel method provides insights into constraining the parameters in the modified gravitational theory, which facilitates a deeper understanding of this complex framework and paving the way for potential technological advancements in the field.
△ Less
Submitted 12 October, 2024;
originally announced October 2024.
-
Multi-Atlas Brain Network Classification through Consistency Distillation and Complementary Information Fusion
Authors:
Jiaxing Xu,
Mengcheng Lan,
Xia Dong,
Kai He,
Wei Zhang,
Qingtian Bian,
Yiping Ke
Abstract:
In the realm of neuroscience, identifying distinctive patterns associated with neurological disorders via brain networks is crucial. Resting-state functional magnetic resonance imaging (fMRI) serves as a primary tool for mapping these networks by correlating blood-oxygen-level-dependent (BOLD) signals across different brain regions, defined as regions of interest (ROIs). Constructing these brain n…
▽ More
In the realm of neuroscience, identifying distinctive patterns associated with neurological disorders via brain networks is crucial. Resting-state functional magnetic resonance imaging (fMRI) serves as a primary tool for mapping these networks by correlating blood-oxygen-level-dependent (BOLD) signals across different brain regions, defined as regions of interest (ROIs). Constructing these brain networks involves using atlases to parcellate the brain into ROIs based on various hypotheses of brain division. However, there is no standard atlas for brain network classification, leading to limitations in detecting abnormalities in disorders. Some recent methods have proposed utilizing multiple atlases, but they neglect consistency across atlases and lack ROI-level information exchange. To tackle these limitations, we propose an Atlas-Integrated Distillation and Fusion network (AIDFusion) to improve brain network classification using fMRI data. AIDFusion addresses the challenge of utilizing multiple atlases by employing a disentangle Transformer to filter out inconsistent atlas-specific information and distill distinguishable connections across atlases. It also incorporates subject- and population-level consistency constraints to enhance cross-atlas consistency. Additionally, AIDFusion employs an inter-atlas message-passing mechanism to fuse complementary information across brain regions. Experimental results on four datasets of different diseases demonstrate the effectiveness and efficiency of AIDFusion compared to state-of-the-art methods. A case study illustrates AIDFusion extract patterns that are both interpretable and consistent with established neuroscience findings.
△ Less
Submitted 28 September, 2024;
originally announced October 2024.
-
A New Statistical Analysis of the Morphology of Spiral Galaxies
Authors:
Junye Wei,
Ye Xu,
Zehao Lin,
Chaojie Hao,
Yingjie Li,
Dejian Liu,
Shuaibo Bian
Abstract:
Morphology is the starting point for understanding galaxies. Elmegreen et al. classified spiral galaxies into flocculent, multiple-arm, and grand-design galaxies based on the regularity of their spiral arm structure. With the release of a vast number of clear spiral galaxy images from the Sloan Digital Sky Survey, we conducted a morphological classification of 5093 blue spiral galaxies. A statisti…
▽ More
Morphology is the starting point for understanding galaxies. Elmegreen et al. classified spiral galaxies into flocculent, multiple-arm, and grand-design galaxies based on the regularity of their spiral arm structure. With the release of a vast number of clear spiral galaxy images from the Sloan Digital Sky Survey, we conducted a morphological classification of 5093 blue spiral galaxies. A statistical analysis of this sample shows that the fractions of flocculent, multiple-arm, and grand-design galaxies are 38 $\pm$ 1%, 59 $\pm$ 1%, and 3 $\pm$ 1%, respectively. Redshift has no obvious influence on this classification. However, as the bulge size becomes larger, the fraction of multiple-arm galaxies increases, while that of flocculent galaxies decreases. In addition, we performed a statistical analysis of 3958 galaxies with a clear spiral arm structure, finding 82% of these galaxies have two arms in their inner regions. We also found that the majority (74%) of the barred spiral galaxies exhibit the characteristics of two inner spiral arms and multiple outer spiral arms, and there is no barred spiral galaxy in this work with four continuous spiral arms from the inner to the outer regions. These results highlight that the spiral arm structure of the Milky Way, according to the current mainstream view of a four-arm galaxy with continuous arms extending from the inner to outer regions, is quite unique. However, our findings align with the spiral morphology of the Milky Way proposed by Xu et al., in which case our Galaxy can be considered typical.
△ Less
Submitted 10 October, 2024;
originally announced October 2024.
-
ConML: A Universal Meta-Learning Framework with Task-Level Contrastive Learning
Authors:
Shiguang Wu,
Yaqing Wang,
Yatao Bian,
Quanming Yao
Abstract:
Meta-learning enables learning systems to adapt quickly to new tasks, similar to humans. To emulate this human-like rapid learning and enhance alignment and discrimination abilities, we propose ConML, a universal meta-learning framework that can be applied to various meta-learning algorithms without relying on specific model architectures nor target models. The core of ConML is task-level contrast…
▽ More
Meta-learning enables learning systems to adapt quickly to new tasks, similar to humans. To emulate this human-like rapid learning and enhance alignment and discrimination abilities, we propose ConML, a universal meta-learning framework that can be applied to various meta-learning algorithms without relying on specific model architectures nor target models. The core of ConML is task-level contrastive learning, which extends contrastive learning from the representation space in unsupervised learning to the model space in meta-learning. By leveraging task identity as an additional supervision signal during meta-training, we contrast the outputs of the meta-learner in the model space, minimizing inner-task distance (between models trained on different subsets of the same task) and maximizing inter-task distance (between models from different tasks). We demonstrate that ConML integrates seamlessly with optimization-based, metric-based, and amortization-based meta-learning algorithms, as well as in-context learning, resulting in performance improvements across diverse few-shot learning tasks.
△ Less
Submitted 14 October, 2024; v1 submitted 8 October, 2024;
originally announced October 2024.
-
Measurement of d2sigma/d|q|dEavail in charged current neutrino-nucleus interactions at <Ev> = 1.86 GeV using the NOvA Near Detector
Authors:
M. A. Acero,
B. Acharya,
P. Adamson,
L. Aliaga,
N. Anfimov,
A. Antoshkin,
E. Arrieta-Diaz,
L. Asquith,
A. Aurisano,
A. Back,
N. Balashov,
P. Baldi,
B. A. Bambah,
E. Bannister,
A. Barros,
S. Bashar,
A. Bat,
K. Bays,
R. Bernstein,
T. J. C. Bezerra,
V. Bhatnagar,
D. Bhattarai,
B. Bhuyan,
J. Bian,
A. C. Booth
, et al. (183 additional authors not shown)
Abstract:
Double- and single-differential cross sections for inclusive charged-current neutrino-nucleus scattering are reported for the kinematic domain 0 to 2 GeV/c in three-momentum transfer and 0 to 2 GeV in available energy, at a mean muon-neutrino energy of 1.86 GeV. The measurements are based on an estimated 995,760 muon-neutrino CC interactions in the scintillator medium of the NOvA Near Detector. Th…
▽ More
Double- and single-differential cross sections for inclusive charged-current neutrino-nucleus scattering are reported for the kinematic domain 0 to 2 GeV/c in three-momentum transfer and 0 to 2 GeV in available energy, at a mean muon-neutrino energy of 1.86 GeV. The measurements are based on an estimated 995,760 muon-neutrino CC interactions in the scintillator medium of the NOvA Near Detector. The subdomain populated by 2-particle-2-hole reactions is identified by the cross-section excess relative to predictions for neutrino-nucleus scattering that are constrained by a data control sample. Models for 2-particle-2- hole processes are rated by chi-square comparisons of the predicted-versus-measured muon-neutrino CC inclusive cross section over the full phase space and in the restricted subdomain. Shortfalls are observed in neutrino generator predictions obtained using the theory-based Val`encia and SuSAv2 2p2h models.
△ Less
Submitted 7 October, 2024;
originally announced October 2024.
-
Refining Counterfactual Explanations With Joint-Distribution-Informed Shapley Towards Actionable Minimality
Authors:
Lei You,
Yijun Bian,
Lele Cao
Abstract:
Counterfactual explanations (CE) identify data points that closely resemble the observed data but produce different machine learning (ML) model outputs, offering critical insights into model decisions. Despite the diverse scenarios, goals and tasks to which they are tailored, existing CE methods often lack actionable efficiency because of unnecessary feature changes included within the explanation…
▽ More
Counterfactual explanations (CE) identify data points that closely resemble the observed data but produce different machine learning (ML) model outputs, offering critical insights into model decisions. Despite the diverse scenarios, goals and tasks to which they are tailored, existing CE methods often lack actionable efficiency because of unnecessary feature changes included within the explanations that are presented to users and stakeholders. We address this problem by proposing a method that minimizes the required feature changes while maintaining the validity of CE, without imposing restrictions on models or CE algorithms, whether instance- or group-based. The key innovation lies in computing a joint distribution between observed and counterfactual data and leveraging it to inform Shapley values for feature attributions (FA). We demonstrate that optimal transport (OT) effectively derives this distribution, especially when the alignment between observed and counterfactual data is unclear in used CE methods. Additionally, a counterintuitive finding is uncovered: it may be misleading to rely on an exact alignment defined by the CE generation mechanism in conducting FA. Our proposed method is validated on extensive experiments across multiple datasets, showcasing its effectiveness in refining CE towards greater actionable efficiency.
△ Less
Submitted 7 October, 2024;
originally announced October 2024.
-
Precision Knowledge Editing: Enhancing Safety in Large Language Models
Authors:
Xuying Li,
Zhuo Li,
Yuji Kosuga,
Yasuhiro Yoshida,
Victor Bian
Abstract:
Large language models (LLMs) have demonstrated remarkable capabilities, but they also pose risks related to the generation of toxic or harmful content. This work introduces Precision Knowledge Editing (PKE), an advanced technique that builds upon existing knowledge editing methods to more effectively identify and modify toxic parameter regions within LLMs. By leveraging neuron weight tracking and…
▽ More
Large language models (LLMs) have demonstrated remarkable capabilities, but they also pose risks related to the generation of toxic or harmful content. This work introduces Precision Knowledge Editing (PKE), an advanced technique that builds upon existing knowledge editing methods to more effectively identify and modify toxic parameter regions within LLMs. By leveraging neuron weight tracking and activation pathway tracing, PKE achieves finer granularity in toxic content management compared to previous methods like Detoxifying Instance Neuron Modification (DINM). Our experiments demonstrate that PKE significantly reduces the attack success rate (ASR) across various models, including Llama2-7b and Llama-3-8b-instruct, while maintaining overall model performance. Additionally, we also compared the performance of some closed-source models (gpt-4-0613 and Claude 3 Sonnet) in our experiments, and found that models adjusted using our method far outperformed the closed-source models in terms of safety. This research contributes to the ongoing efforts to make LLMs safer and more reliable for real-world applications.
△ Less
Submitted 2 October, 2024;
originally announced October 2024.
-
C-MORL: Multi-Objective Reinforcement Learning through Efficient Discovery of Pareto Front
Authors:
Ruohong Liu,
Yuxin Pan,
Linjie Xu,
Lei Song,
Pengcheng You,
Yize Chen,
Jiang Bian
Abstract:
Multi-objective reinforcement learning (MORL) excels at handling rapidly changing preferences in tasks that involve multiple criteria, even for unseen preferences. However, previous dominating MORL methods typically generate a fixed policy set or preference-conditioned policy through multiple training iterations exclusively for sampled preference vectors, and cannot ensure the efficient discovery…
▽ More
Multi-objective reinforcement learning (MORL) excels at handling rapidly changing preferences in tasks that involve multiple criteria, even for unseen preferences. However, previous dominating MORL methods typically generate a fixed policy set or preference-conditioned policy through multiple training iterations exclusively for sampled preference vectors, and cannot ensure the efficient discovery of the Pareto front. Furthermore, integrating preferences into the input of policy or value functions presents scalability challenges, in particular as the dimension of the state and preference space grow, which can complicate the learning process and hinder the algorithm's performance on more complex tasks. To address these issues, we propose a two-stage Pareto front discovery algorithm called Constrained MORL (C-MORL), which serves as a seamless bridge between constrained policy optimization and MORL. Concretely, a set of policies is trained in parallel in the initialization stage, with each optimized towards its individual preference over the multiple objectives. Then, to fill the remaining vacancies in the Pareto front, the constrained optimization steps are employed to maximize one objective while constraining the other objectives to exceed a predefined threshold. Empirically, compared to recent advancements in MORL methods, our algorithm achieves more consistent and superior performances in terms of hypervolume, expected utility, and sparsity on both discrete and continuous control tasks, especially with numerous objectives (up to nine objectives in our experiments).
△ Less
Submitted 3 October, 2024;
originally announced October 2024.
-
MedQA-CS: Benchmarking Large Language Models Clinical Skills Using an AI-SCE Framework
Authors:
Zonghai Yao,
Zihao Zhang,
Chaolong Tang,
Xingyu Bian,
Youxia Zhao,
Zhichao Yang,
Junda Wang,
Huixue Zhou,
Won Seok Jang,
Feiyun Ouyang,
Hong Yu
Abstract:
Artificial intelligence (AI) and large language models (LLMs) in healthcare require advanced clinical skills (CS), yet current benchmarks fail to evaluate these comprehensively. We introduce MedQA-CS, an AI-SCE framework inspired by medical education's Objective Structured Clinical Examinations (OSCEs), to address this gap. MedQA-CS evaluates LLMs through two instruction-following tasks, LLM-as-me…
▽ More
Artificial intelligence (AI) and large language models (LLMs) in healthcare require advanced clinical skills (CS), yet current benchmarks fail to evaluate these comprehensively. We introduce MedQA-CS, an AI-SCE framework inspired by medical education's Objective Structured Clinical Examinations (OSCEs), to address this gap. MedQA-CS evaluates LLMs through two instruction-following tasks, LLM-as-medical-student and LLM-as-CS-examiner, designed to reflect real clinical scenarios. Our contributions include developing MedQA-CS, a comprehensive evaluation framework with publicly available data and expert annotations, and providing the quantitative and qualitative assessment of LLMs as reliable judges in CS evaluation. Our experiments show that MedQA-CS is a more challenging benchmark for evaluating clinical skills than traditional multiple-choice QA benchmarks (e.g., MedQA). Combined with existing benchmarks, MedQA-CS enables a more comprehensive evaluation of LLMs' clinical capabilities for both open- and closed-source LLMs.
△ Less
Submitted 2 October, 2024;
originally announced October 2024.
-
FlashMask: Efficient and Rich Mask Extension of FlashAttention
Authors:
Guoxia Wang,
Jinle Zeng,
Xiyuan Xiao,
Siming Wu,
Jiabin Yang,
Lujing Zheng,
Zeyu Chen,
Jiang Bian,
Dianhai Yu,
Haifeng Wang
Abstract:
The computational and memory demands of vanilla attention scale quadratically with the sequence length $N$, posing significant challenges for processing long sequences in Transformer models. FlashAttention alleviates these challenges by eliminating the $O(N^2)$ memory dependency and reducing attention latency through IO-aware memory optimizations. However, its native support for certain attention…
▽ More
The computational and memory demands of vanilla attention scale quadratically with the sequence length $N$, posing significant challenges for processing long sequences in Transformer models. FlashAttention alleviates these challenges by eliminating the $O(N^2)$ memory dependency and reducing attention latency through IO-aware memory optimizations. However, its native support for certain attention mask types is limited, and it does not inherently accommodate more complex masking requirements. Previous approaches resort to using dense masks with $O(N^2)$ memory complexity, leading to inefficiencies. In this paper, we propose FlashMask, an extension of FlashAttention that introduces a column-wise sparse representation of attention masks. This approach efficiently represents a wide range of mask types and facilitates the development of optimized kernel implementations. By adopting this novel representation, FlashMask achieves linear memory complexity $O(N)$, suitable for modeling long-context sequences. Moreover, this representation enables kernel optimizations that eliminate unnecessary computations by leveraging sparsity in the attention mask, without sacrificing computational accuracy, resulting in higher computational efficiency. We evaluate FlashMask's performance in fine-tuning and alignment training of LLMs such as SFT, LoRA, DPO, and RM. FlashMask achieves significant throughput improvements, with end-to-end speedups ranging from 1.65x to 3.22x compared to existing FlashAttention dense method. Additionally, our kernel-level comparisons demonstrate that FlashMask surpasses the latest counterpart, FlexAttention, by 12.1% to 60.7% in terms of kernel TFLOPs/s, achieving 37.8% to 62.3% of the theoretical maximum FLOPs/s on the A100 GPU. The code is open-sourced on PaddlePaddle and integrated into PaddleNLP, supporting models with over 100 billion parameters for contexts up to 128K tokens.
△ Less
Submitted 2 October, 2024;
originally announced October 2024.
-
A semiclassical non-adiabatic phase-space approach to molecular translations and rotations: A new picture of surface hopping and electronic inertial effects
Authors:
Xuezhi Bian,
Yanze Wu,
Tian Qiu,
Tao Zhen,
Joseph E. Subotnik
Abstract:
We present a novel semiclassical phase-space surface hopping approach that goes beyond the Born-Oppenheimer approximation and all existing surface hopping formalisms. We demonstrate that working with a correct phase-space electronic Hamiltonian can capture electronic inertial effects during pure nuclear translational and rotational motion and completely eliminate (at least to very high order) non-…
▽ More
We present a novel semiclassical phase-space surface hopping approach that goes beyond the Born-Oppenheimer approximation and all existing surface hopping formalisms. We demonstrate that working with a correct phase-space electronic Hamiltonian can capture electronic inertial effects during pure nuclear translational and rotational motion and completely eliminate (at least to very high order) non-adiabatic transitions between electronic eigenstates. This work opens many new avenues for quantitatively investigating complex phenomena, including angular momentum transfer between chiral phonons and electrons as well as chiral-induced spin selectivity effects.
△ Less
Submitted 1 October, 2024;
originally announced October 2024.
-
Search for proton decay via $p\rightarrow{e^+η}$ and $p\rightarrow{μ^+η}$ with a 0.37 Mton-year exposure of Super-Kamiokande
Authors:
Super-Kamiokande Collaboration,
:,
N. Taniuchi,
K. Abe,
S. Abe,
Y. Asaoka,
C. Bronner,
M. Harada,
Y. Hayato,
K. Hiraide,
K. Hosokawa,
K. Ieki,
M. Ikeda,
J. Kameda,
Y. Kanemura,
R. Kaneshima,
Y. Kashiwagi,
Y. Kataoka,
S. Miki,
S. Mine,
M. Miura,
S. Moriyama,
M. Nakahata,
S. Nakayama,
Y. Noguchi
, et al. (267 additional authors not shown)
Abstract:
A search for proton decay into $e^+/μ^+$ and a $η$ meson has been performed using data from a 0.373 Mton$\cdot$year exposure (6050.3 live days) of Super-Kamiokande. Compared to previous searches this work introduces an improved model of the intranuclear $η$ interaction cross section, resulting in a factor of two reduction in uncertainties from this source and $\sim$10\% increase in signal efficien…
▽ More
A search for proton decay into $e^+/μ^+$ and a $η$ meson has been performed using data from a 0.373 Mton$\cdot$year exposure (6050.3 live days) of Super-Kamiokande. Compared to previous searches this work introduces an improved model of the intranuclear $η$ interaction cross section, resulting in a factor of two reduction in uncertainties from this source and $\sim$10\% increase in signal efficiency. No significant data excess was found above the expected number of atmospheric neutrino background events resulting in no indication of proton decay into either mode. Lower limits on the proton partial lifetime of $1.4\times\mathrm{10^{34}~years}$ for $p\rightarrow e^+η$ and $7.3\times\mathrm{10^{33}~years}$ for $p\rightarrow μ^+η$ at the 90$\%$ C.L. were set. These limits are around 1.5 times longer than our previous study and are the most stringent to date.
△ Less
Submitted 29 September, 2024;
originally announced September 2024.
-
The hypothetical track-length fitting algorithm for energy measurement in liquid argon TPCs
Authors:
DUNE Collaboration,
A. Abed Abud,
B. Abi,
R. Acciarri,
M. A. Acero,
M. R. Adames,
G. Adamov,
M. Adamowski,
D. Adams,
M. Adinolfi,
C. Adriano,
A. Aduszkiewicz,
J. Aguilar,
F. Akbar,
N. S. Alex,
K. Allison,
S. Alonso Monsalve,
M. Alrashed,
A. Alton,
R. Alvarez,
T. Alves,
H. Amar,
P. Amedo,
J. Anderson,
C. Andreopoulos
, et al. (1348 additional authors not shown)
Abstract:
This paper introduces the hypothetical track-length fitting algorithm, a novel method for measuring the kinetic energies of ionizing particles in liquid argon time projection chambers (LArTPCs). The algorithm finds the most probable offset in track length for a track-like object by comparing the measured ionization density as a function of position with a theoretical prediction of the energy loss…
▽ More
This paper introduces the hypothetical track-length fitting algorithm, a novel method for measuring the kinetic energies of ionizing particles in liquid argon time projection chambers (LArTPCs). The algorithm finds the most probable offset in track length for a track-like object by comparing the measured ionization density as a function of position with a theoretical prediction of the energy loss as a function of the energy, including models of electron recombination and detector response. The algorithm can be used to measure the energies of particles that interact before they stop, such as charged pions that are absorbed by argon nuclei. The algorithm's energy measurement resolutions and fractional biases are presented as functions of particle kinetic energy and number of track hits using samples of stopping secondary charged pions in data collected by the ProtoDUNE-SP detector, and also in a detailed simulation. Additional studies describe impact of the dE/dx model on energy measurement performance. The method described in this paper to characterize the energy measurement performance can be repeated in any LArTPC experiment using stopping secondary charged pions.
△ Less
Submitted 1 October, 2024; v1 submitted 26 September, 2024;
originally announced September 2024.
-
A Database Engineered System for Big Data Analytics on Tornado Climatology
Authors:
Fengfan Bian,
Carson K. Leung,
Piers Grenier,
Harry Pu,
Samuel Ning,
Alfredo Cuzzocrea
Abstract:
Recognizing the challenges with current tornado warning systems, we investigate alternative approaches. In particular, we present a database engi-neered system that integrates information from heterogeneous rich data sources, including climatology data for tornadoes and data just before a tornado warning. The system aids in predicting tornado occurrences by identifying the data points that form th…
▽ More
Recognizing the challenges with current tornado warning systems, we investigate alternative approaches. In particular, we present a database engi-neered system that integrates information from heterogeneous rich data sources, including climatology data for tornadoes and data just before a tornado warning. The system aids in predicting tornado occurrences by identifying the data points that form the basis of a tornado warning. Evaluation on US data highlights the advantages of using a classification forecasting recurrent neural network (RNN) model. The results highlight the effectiveness of our database engineered system for big data analytics on tornado climatology-especially, in accurately predict-ing tornado lead-time, magnitude, and location, contributing to the development of sustainable cities.
△ Less
Submitted 26 September, 2024;
originally announced September 2024.
-
Do We Need iPhone Moment or Xiaomi Moment for Robots? Design of Affordable Home Robots for Health Monitoring
Authors:
Bo Wei,
Yaya Bian,
Mingcen Gao
Abstract:
In this paper, we study cost-effective home robot solutions which are designed for home health monitoring. The recent advancements in Artificial Intelligence (AI) have significantly advanced the capabilities of the robots, enabling them to better and efficiently understand and interact with their surroundings. The most common robots currently used in homes are toy robots and cleaning robots. While…
▽ More
In this paper, we study cost-effective home robot solutions which are designed for home health monitoring. The recent advancements in Artificial Intelligence (AI) have significantly advanced the capabilities of the robots, enabling them to better and efficiently understand and interact with their surroundings. The most common robots currently used in homes are toy robots and cleaning robots. While these are relatively affordable, their functionalities are very limited. On the other hand, humanoid and quadruped robots offer more sophisticated features and capabilities, albeit at a much higher cost. Another category is educational robots, which provide educators with the flexibility to attach various sensors and integrate different design methods with the integrated operating systems. However, the challenge still exists in bridging the gap between affordability and functionality. Our research aims to address this by exploring the potential of developing advanced yet affordable and accessible robots for home robots, aiming for health monitoring, by using edge computing techniques and taking advantage of existing computing resources for home robots, such as mobile phones.
△ Less
Submitted 25 September, 2024;
originally announced September 2024.
-
Generative Pre-trained Ranking Model with Over-parameterization at Web-Scale (Extended Abstract)
Authors:
Yuchen Li,
Haoyi Xiong,
Linghe Kong,
Jiang Bian,
Shuaiqiang Wang,
Guihai Chen,
Dawei Yin
Abstract:
Learning to rank (LTR) is widely employed in web searches to prioritize pertinent webpages from retrieved content based on input queries. However, traditional LTR models encounter two principal obstacles that lead to suboptimal performance: (1) the lack of well-annotated query-webpage pairs with ranking scores covering a diverse range of search query popularities, which hampers their ability to ad…
▽ More
Learning to rank (LTR) is widely employed in web searches to prioritize pertinent webpages from retrieved content based on input queries. However, traditional LTR models encounter two principal obstacles that lead to suboptimal performance: (1) the lack of well-annotated query-webpage pairs with ranking scores covering a diverse range of search query popularities, which hampers their ability to address queries across the popularity spectrum, and (2) inadequately trained models that fail to induce generalized representations for LTR, resulting in overfitting. To address these challenges, we propose a \emph{\uline{G}enerative \uline{S}emi-\uline{S}upervised \uline{P}re-trained} (GS2P) LTR model. We conduct extensive offline experiments on both a publicly available dataset and a real-world dataset collected from a large-scale search engine. Furthermore, we deploy GS2P in a large-scale web search engine with realistic traffic, where we observe significant improvements in the real-world application.
△ Less
Submitted 24 September, 2024;
originally announced September 2024.
-
Magnetic field effects on electroweak phase transition and baryon asymmetry
Authors:
Yuefeng Di,
Ligong Bian,
Rong-Gen Cai
Abstract:
In the early universe, the first-order phase transition may occur in the background of magnetic fields, leading to baryon number asymmetry through chiral anomaly. We have numerically simulated the first-order electroweak phase transition in the background of a magnetic field in a three-dimensional lattice, discovered the phenomenon of Higgs condensation, and for the first time given the relationsh…
▽ More
In the early universe, the first-order phase transition may occur in the background of magnetic fields, leading to baryon number asymmetry through chiral anomaly. We have numerically simulated the first-order electroweak phase transition in the background of a magnetic field in a three-dimensional lattice, discovered the phenomenon of Higgs condensation, and for the first time given the relationship between baryon number asymmetry and magnetic field strength. The magnetic field strength required to achieve the matter-antimatter asymmetry by the evolution of magnetohydrodynamics is about $10^{-17}\sim10^{-14}$ Gauss at present depending on the correlation length of the helical magnetic field. Our research provides a strong basis for explaining the baryon number asymmetry with cosmic magnetic fields.
△ Less
Submitted 24 September, 2024;
originally announced September 2024.
-
Magnetogenesis and baryogenesis during and after electroweak phase transition
Authors:
Hui Liu,
Renhui Qin,
Ligong Bian
Abstract:
In this paper, we investigate the generation of baryon asymmetry of the universe (BAU) following the first-order electroweak phase transition. Our study indicates that the presence of CP-violating operators can lead to the generation of the helical magnetic field, which further induces the growth of the lepton asymmetry and the generation of the BAU as the Universe cools down. This process can yie…
▽ More
In this paper, we investigate the generation of baryon asymmetry of the universe (BAU) following the first-order electroweak phase transition. Our study indicates that the presence of CP-violating operators can lead to the generation of the helical magnetic field, which further induces the growth of the lepton asymmetry and the generation of the BAU as the Universe cools down. This process can yield the observed BAU when the new physics scale is lower than 700 GeV.
△ Less
Submitted 24 September, 2024;
originally announced September 2024.
-
Beyond Skip Connection: Pooling and Unpooling Design for Elimination Singularities
Authors:
Chengkun Sun,
Jinqian Pan,
Juoli Jin,
Russell Stevens Terry,
Jiang Bian,
Jie Xu
Abstract:
Training deep Convolutional Neural Networks (CNNs) presents unique challenges, including the pervasive issue of elimination singularities, consistent deactivation of nodes leading to degenerate manifolds within the loss landscape. These singularities impede efficient learning by disrupting feature propagation. To mitigate this, we introduce Pool Skip, an architectural enhancement that strategicall…
▽ More
Training deep Convolutional Neural Networks (CNNs) presents unique challenges, including the pervasive issue of elimination singularities, consistent deactivation of nodes leading to degenerate manifolds within the loss landscape. These singularities impede efficient learning by disrupting feature propagation. To mitigate this, we introduce Pool Skip, an architectural enhancement that strategically combines a Max Pooling, a Max Unpooling, a 3 times 3 convolution, and a skip connection. This configuration helps stabilize the training process and maintain feature integrity across layers. We also propose the Weight Inertia hypothesis, which underpins the development of Pool Skip, providing theoretical insights into mitigating degradation caused by elimination singularities through dimensional and affine compensation. We evaluate our method on a variety of benchmarks, focusing on both 2D natural and 3D medical imaging applications, including tasks such as classification and segmentation. Our findings highlight Pool Skip's effectiveness in facilitating more robust CNN training and improving model performance.
△ Less
Submitted 19 September, 2024;
originally announced September 2024.
-
GASA-UNet: Global Axial Self-Attention U-Net for 3D Medical Image Segmentation
Authors:
Chengkun Sun,
Russell Stevens Terry,
Jiang Bian,
Jie Xu
Abstract:
Accurate segmentation of multiple organs and the differentiation of pathological tissues in medical imaging are crucial but challenging, especially for nuanced classifications and ambiguous organ boundaries. To tackle these challenges, we introduce GASA-UNet, a refined U-Net-like model featuring a novel Global Axial Self-Attention (GASA) block. This block processes image data as a 3D entity, with…
▽ More
Accurate segmentation of multiple organs and the differentiation of pathological tissues in medical imaging are crucial but challenging, especially for nuanced classifications and ambiguous organ boundaries. To tackle these challenges, we introduce GASA-UNet, a refined U-Net-like model featuring a novel Global Axial Self-Attention (GASA) block. This block processes image data as a 3D entity, with each 2D plane representing a different anatomical cross-section. Voxel features are defined within this spatial context, and a Multi-Head Self-Attention (MHSA) mechanism is utilized on extracted 1D patches to facilitate connections across these planes. Positional embeddings (PE) are incorporated into our attention framework, enriching voxel features with spatial context and enhancing tissue classification and organ edge delineation. Our model has demonstrated promising improvements in segmentation performance, particularly for smaller anatomical structures, as evidenced by enhanced Dice scores and Normalized Surface Dice (NSD) on three benchmark datasets, i.e., BTCV, AMOS, and KiTS23.
△ Less
Submitted 19 September, 2024;
originally announced September 2024.
-
BGDB: Bernoulli-Gaussian Decision Block with Improved Denoising Diffusion Probabilistic Models
Authors:
Chengkun Sun,
Jinqian Pan,
Russell Stevens Terry,
Jiang Bian,
Jie Xu
Abstract:
Generative models can enhance discriminative classifiers by constructing complex feature spaces, thereby improving performance on intricate datasets. Conventional methods typically augment datasets with more detailed feature representations or increase dimensionality to make nonlinear data linearly separable. Utilizing a generative model solely for feature space processing falls short of unlocking…
▽ More
Generative models can enhance discriminative classifiers by constructing complex feature spaces, thereby improving performance on intricate datasets. Conventional methods typically augment datasets with more detailed feature representations or increase dimensionality to make nonlinear data linearly separable. Utilizing a generative model solely for feature space processing falls short of unlocking its full potential within a classifier and typically lacks a solid theoretical foundation. We base our approach on a novel hypothesis: the probability information (logit) derived from a single model training can be used to generate the equivalent of multiple training sessions. Leveraging the central limit theorem, this synthesized probability information is anticipated to converge toward the true probability more accurately. To achieve this goal, we propose the Bernoulli-Gaussian Decision Block (BGDB), a novel module inspired by the Central Limit Theorem and the concept that the mean of multiple Bernoulli trials approximates the probability of success in a single trial. Specifically, we utilize Improved Denoising Diffusion Probabilistic Models (IDDPM) to model the probability of Bernoulli Trials. Our approach shifts the focus from reconstructing features to reconstructing logits, transforming the logit from a single iteration into logits analogous to those from multiple experiments. We provide the theoretical foundations of our approach through mathematical analysis and validate its effectiveness through experimental evaluation using various datasets for multiple imaging tasks, including both classification and segmentation.
△ Less
Submitted 19 September, 2024;
originally announced September 2024.
-
MambaClinix: Hierarchical Gated Convolution and Mamba-Based U-Net for Enhanced 3D Medical Image Segmentation
Authors:
Chenyuan Bian,
Nan Xia,
Xia Yang,
Feifei Wang,
Fengjiao Wang,
Bin Wei,
Qian Dong
Abstract:
Deep learning, particularly convolutional neural networks (CNNs) and Transformers, has significantly advanced 3D medical image segmentation. While CNNs are highly effective at capturing local features, their limited receptive fields may hinder performance in complex clinical scenarios. In contrast, Transformers excel at modeling long-range dependencies but are computationally intensive, making the…
▽ More
Deep learning, particularly convolutional neural networks (CNNs) and Transformers, has significantly advanced 3D medical image segmentation. While CNNs are highly effective at capturing local features, their limited receptive fields may hinder performance in complex clinical scenarios. In contrast, Transformers excel at modeling long-range dependencies but are computationally intensive, making them expensive to train and deploy. Recently, the Mamba architecture, based on the State Space Model (SSM), has been proposed to efficiently model long-range dependencies while maintaining linear computational complexity. However, its application in medical image segmentation reveals shortcomings, particularly in capturing critical local features essential for accurate delineation of clinical regions. In this study, we propose MambaClinix, a novel U-shaped architecture for medical image segmentation that integrates a hierarchical gated convolutional network(HGCN) with Mamba in an adaptive stage-wise framework. This design significantly enhances computational efficiency and high-order spatial interactions, enabling the model to effectively capture both proximal and distal relationships in medical images. Specifically, our HGCN is designed to mimic the attention mechanism of Transformers by a purely convolutional structure, facilitating high-order spatial interactions in feature maps while avoiding the computational complexity typically associated with Transformer-based methods. Additionally, we introduce a region-specific Tversky loss, which emphasizes specific pixel regions to improve auto-segmentation performance, thereby optimizing the model's decision-making process. Experimental results on five benchmark datasets demonstrate that the proposed MambaClinix achieves high segmentation accuracy while maintaining low model complexity.
△ Less
Submitted 19 September, 2024;
originally announced September 2024.
-
Contrasformer: A Brain Network Contrastive Transformer for Neurodegenerative Condition Identification
Authors:
Jiaxing Xu,
Kai He,
Mengcheng Lan,
Qingtian Bian,
Wei Li,
Tieying Li,
Yiping Ke,
Miao Qiao
Abstract:
Understanding neurological disorder is a fundamental problem in neuroscience, which often requires the analysis of brain networks derived from functional magnetic resonance imaging (fMRI) data. Despite the prevalence of Graph Neural Networks (GNNs) and Graph Transformers in various domains, applying them to brain networks faces challenges. Specifically, the datasets are severely impacted by the no…
▽ More
Understanding neurological disorder is a fundamental problem in neuroscience, which often requires the analysis of brain networks derived from functional magnetic resonance imaging (fMRI) data. Despite the prevalence of Graph Neural Networks (GNNs) and Graph Transformers in various domains, applying them to brain networks faces challenges. Specifically, the datasets are severely impacted by the noises caused by distribution shifts across sub-populations and the neglect of node identities, both obstruct the identification of disease-specific patterns. To tackle these challenges, we propose Contrasformer, a novel contrastive brain network Transformer. It generates a prior-knowledge-enhanced contrast graph to address the distribution shifts across sub-populations by a two-stream attention mechanism. A cross attention with identity embedding highlights the identity of nodes, and three auxiliary losses ensure group consistency. Evaluated on 4 functional brain network datasets over 4 different diseases, Contrasformer outperforms the state-of-the-art methods for brain networks by achieving up to 10.8\% improvement in accuracy, which demonstrates its efficacy in neurological disorder identification. Case studies illustrate its interpretability, especially in the context of neuroscience. This paper provides a solution for analyzing brain networks, offering valuable insights into neurological disorders. Our code is available at \url{https://github.com/AngusMonroe/Contrasformer}.
△ Less
Submitted 17 September, 2024;
originally announced September 2024.
-
Theory of optical spinpolarization of axial divacancy and nitrogen-vacancy defects in 4H-SiC
Authors:
Guodong Bian,
Gergő Thiering,
Ádám Gali
Abstract:
The neutral divacancy and the negatively charged nitrogen-vacancy defects in 4H-silicon carbide (SiC) are two of the most prominent candidates for functioning as room-temperature quantum bits (qubits) with telecommunication-wavelength emission. Nonetheless, the pivotal role of electron-phonon coupling in the spinpolarization loop is still unrevealed. In this work, we theoretically investigate the…
▽ More
The neutral divacancy and the negatively charged nitrogen-vacancy defects in 4H-silicon carbide (SiC) are two of the most prominent candidates for functioning as room-temperature quantum bits (qubits) with telecommunication-wavelength emission. Nonetheless, the pivotal role of electron-phonon coupling in the spinpolarization loop is still unrevealed. In this work, we theoretically investigate the microscopic magneto-optical properties and spin-dependent optical loops utilizing the first-principles calculations. First, we quantitatively demonstrate the electronic level structure, assisted by symmetry analysis. Moreover, the fine interactions, including spin-orbit coupling and spin-spin interaction, are fully characterized to provide versatile qubit functional parameters. Subsequently, we explore the electron-phonon coupling, encompassing dynamics- and pseudo-Jahn--Teller effects in the intersystem crossing transition. In addition, we analyze the photoluminescence PL lifetime based on the major transition rates in the optical spinpolarization loop. We compare two promising qubits with similar electronic properties, but their respective rates differ substantially. Finally, we detail the threshold of ODMR contrast for further optimization of the qubit operation. This work not only reveals the mechanism underlying the optical spinpolarization but also proposes productive avenues for optimizing quantum information processing tasks based on the ODMR protocol.
△ Less
Submitted 16 September, 2024;
originally announced September 2024.
-
Approximation of divergence-free vector fields vanishing on rough planar sets
Authors:
Giacomo Del Nin,
Bian Wu
Abstract:
Given any divergence-free vector field of Sobolev class $W^{m,p}_0(Ω)$ in a bounded open subset $Ω\subset \mathbb{R}^2$, we are interested in approximating it in the $W^{m,p}$ norm with divergence-free smooth vector fields compactly supported in $Ω$. We show that this approximation property holds in the following cases: For $p>2$, this holds given that $\partial Ω$ has zero Lebesgue measure (a wea…
▽ More
Given any divergence-free vector field of Sobolev class $W^{m,p}_0(Ω)$ in a bounded open subset $Ω\subset \mathbb{R}^2$, we are interested in approximating it in the $W^{m,p}$ norm with divergence-free smooth vector fields compactly supported in $Ω$. We show that this approximation property holds in the following cases: For $p>2$, this holds given that $\partial Ω$ has zero Lebesgue measure (a weaker but more technical condition is sufficient); For $p \leq 2$, this holds if $Ω^c$ can be decomposed into finitely many disjoint closed sets, each of which is connected or $d$-Ahlfors regular for some $d\in[0,2)$. This has links to the uniqueness of weak solutions to the Stokes equation in $Ω$. For Hölder spaces, we prove this approximation property in general bounded domains.
△ Less
Submitted 19 September, 2024; v1 submitted 15 September, 2024;
originally announced September 2024.
-
WeatherReal: A Benchmark Based on In-Situ Observations for Evaluating Weather Models
Authors:
Weixin Jin,
Jonathan Weyn,
Pengcheng Zhao,
Siqi Xiang,
Jiang Bian,
Zuliang Fang,
Haiyu Dong,
Hongyu Sun,
Kit Thambiratnam,
Qi Zhang
Abstract:
In recent years, AI-based weather forecasting models have matched or even outperformed numerical weather prediction systems. However, most of these models have been trained and evaluated on reanalysis datasets like ERA5. These datasets, being products of numerical models, often diverge substantially from actual observations in some crucial variables like near-surface temperature, wind, precipitati…
▽ More
In recent years, AI-based weather forecasting models have matched or even outperformed numerical weather prediction systems. However, most of these models have been trained and evaluated on reanalysis datasets like ERA5. These datasets, being products of numerical models, often diverge substantially from actual observations in some crucial variables like near-surface temperature, wind, precipitation and clouds - parameters that hold significant public interest. To address this divergence, we introduce WeatherReal, a novel benchmark dataset for weather forecasting, derived from global near-surface in-situ observations. WeatherReal also features a publicly accessible quality control and evaluation framework. This paper details the sources and processing methodologies underlying the dataset, and further illustrates the advantage of in-situ observations in capturing hyper-local and extreme weather through comparative analyses and case studies. Using WeatherReal, we evaluated several data-driven models and compared them with leading numerical models. Our work aims to advance the AI-based weather forecasting research towards a more application-focused and operation-ready approach.
△ Less
Submitted 14 September, 2024;
originally announced September 2024.
-
PeriGuru: A Peripheral Robotic Mobile App Operation Assistant based on GUI Image Understanding and Prompting with LLM
Authors:
Kelin Fu,
Yang Tian,
Kaigui Bian
Abstract:
Smartphones have significantly enhanced our daily learning, communication, and entertainment, becoming an essential component of modern life. However, certain populations, including the elderly and individuals with disabilities, encounter challenges in utilizing smartphones, thus necessitating mobile app operation assistants, a.k.a. mobile app agent. With considerations for privacy, permissions, a…
▽ More
Smartphones have significantly enhanced our daily learning, communication, and entertainment, becoming an essential component of modern life. However, certain populations, including the elderly and individuals with disabilities, encounter challenges in utilizing smartphones, thus necessitating mobile app operation assistants, a.k.a. mobile app agent. With considerations for privacy, permissions, and cross-platform compatibility issues, we endeavor to devise and develop PeriGuru in this work, a peripheral robotic mobile app operation assistant based on GUI image understanding and prompting with Large Language Model (LLM). PeriGuru leverages a suite of computer vision techniques to analyze GUI screenshot images and employs LLM to inform action decisions, which are then executed by robotic arms. PeriGuru achieves a success rate of 81.94% on the test task set, which surpasses by more than double the method without PeriGuru's GUI image interpreting and prompting design. Our code is available on https://github.com/Z2sJ4t/PeriGuru.
△ Less
Submitted 14 September, 2024;
originally announced September 2024.
-
USTC-TD: A Test Dataset and Benchmark for Image and Video Coding in 2020s
Authors:
Zhuoyuan Li,
Junqi Liao,
Chuanbo Tang,
Haotian Zhang,
Yuqi Li,
Yifan Bian,
Xihua Sheng,
Xinmin Feng,
Yao Li,
Changsheng Gao,
Li Li,
Dong Liu,
Feng Wu
Abstract:
Image/video coding has been a remarkable research area for both academia and industry for many years. Testing datasets, especially high-quality image/video datasets are desirable for the justified evaluation of coding-related research, practical applications, and standardization activities. We put forward a test dataset namely USTC-TD, which has been successfully adopted in the practical end-to-en…
▽ More
Image/video coding has been a remarkable research area for both academia and industry for many years. Testing datasets, especially high-quality image/video datasets are desirable for the justified evaluation of coding-related research, practical applications, and standardization activities. We put forward a test dataset namely USTC-TD, which has been successfully adopted in the practical end-to-end image/video coding challenge of the IEEE International Conference on Visual Communications and Image Processing in 2022 and 2023. USTC-TD contains 40 images at 4K spatial resolution and 10 video sequences at 1080p spatial resolution, featuring various content due to the diverse environmental factors (scene type, texture, motion, view) and the designed imaging factors (illumination, shadow, lens). We quantitatively evaluate USTC-TD on different image/video features (spatial, temporal, color, lightness), and compare it with the previous image/video test datasets, which verifies the wider coverage and more diversity of the proposed dataset. We also evaluate both classic standardized and recent learned image/video coding schemes on USTC-TD with PSNR and MS-SSIM, and provide an extensive benchmark for the evaluated schemes. Based on the characteristics and specific design of the proposed test dataset, we analyze the benchmark performance and shed light on the future research and development of image/video coding. All the data are released online: https://esakak.github.io/USTC-TD.
△ Less
Submitted 12 September, 2024;
originally announced September 2024.
-
MarS: a Financial Market Simulation Engine Powered by Generative Foundation Model
Authors:
Junjie Li,
Yang Liu,
Weiqing Liu,
Shikai Fang,
Lewen Wang,
Chang Xu,
Jiang Bian
Abstract:
Generative models aim to simulate realistic effects of various actions across different contexts, from text generation to visual effects. Despite efforts to build real-world simulators, leveraging generative models for virtual worlds, like financial markets, remains underexplored. In financial markets, generative models can simulate market effects of various behaviors, enabling interaction with ma…
▽ More
Generative models aim to simulate realistic effects of various actions across different contexts, from text generation to visual effects. Despite efforts to build real-world simulators, leveraging generative models for virtual worlds, like financial markets, remains underexplored. In financial markets, generative models can simulate market effects of various behaviors, enabling interaction with market scenes and players, and training strategies without financial risk. This simulation relies on the finest structured data in financial market like orders thus building the finest realistic simulation. We propose Large Market Model (LMM), an order-level generative foundation model, for financial market simulation, akin to language modeling in the digital world. Our financial Market Simulation engine (MarS), powered by LMM, addresses the need for realistic, interactive and controllable order generation. Key objectives of this paper include evaluating LMM's scaling law in financial markets, assessing MarS's realism, balancing controlled generation with market impact, and demonstrating MarS's potential applications. We showcase MarS as a forecast tool, detection system, analysis platform, and agent training environment. Our contributions include pioneering a generative model for financial markets, designing MarS to meet domain-specific needs, and demonstrating MarS-based applications' industry potential.
△ Less
Submitted 4 September, 2024;
originally announced September 2024.
-
Low-energy critical behavior in two-dimensional tilted semi-Dirac semimetals driven by fermion-fermion interactions
Authors:
Wen Liu,
Wen-Hao Bian,
Xiao-Zhuo Chu,
Jing Wang
Abstract:
Employing the renormalization group approach, we carefully investigate the critical behavior of two-dimensional tilted semi-Dirac semimetals induced by the fermion-fermion interactions in the low-energy regime. After incorporating all one-loop corrections, we derive the coupled RG equations of all related parameters and introduce two distinct strategies, named as Strategy I and Strategy II, to des…
▽ More
Employing the renormalization group approach, we carefully investigate the critical behavior of two-dimensional tilted semi-Dirac semimetals induced by the fermion-fermion interactions in the low-energy regime. After incorporating all one-loop corrections, we derive the coupled RG equations of all related parameters and introduce two distinct strategies, named as Strategy I and Strategy II, to describe different scenarios. A detailed numerical analysis yields several interesting behavior in the low-energy limit. At first, we notice that the fermion-fermion interactions either vanish or diverge in the Strategy I, depending on the initial values of the tilting parameter and the fermionic couplings, whereas these interactions in the Strategy II always diverge at a certain critical energy scale, which is associated with the initial conditions. Next, the microstructural parameter $α$ and the fermion velocity $v_F$ in the Strategy I share the similar behavior with their Strategy II counterparts. It is observed that fermion-fermion interactions lead to an increase in $α$ while driving a decrease in $v_F$. Furthermore, the system can either be attracted by the Gaussian fixed point (GFP) or certain relatively fixed point (RFP) in the Strategy I. However, it always flow towards the RFP in the Strategy II at the lowest-energy limit. These results would provide helpful insights into the studies on observable quantities and phase transitions in the two-dimensional tilted semi-Dirac semimetals and the analogous semimetals.
△ Less
Submitted 11 September, 2024;
originally announced September 2024.
-
Enhancing Cross-domain Pre-Trained Decision Transformers with Adaptive Attention
Authors:
Wenhao Zhao,
Qiushui Xu,
Linjie Xu,
Lei Song,
Jinyu Wang,
Chunlai Zhou,
Jiang Bian
Abstract:
Recently, the pre-training of decision transformers (DT) using a different domain, such as natural language text, has generated significant attention in offline reinforcement learning (Offline RL). Although this cross-domain pre-training approach achieves superior performance compared to training from scratch in environments required short-term planning ability, the mechanisms by which pre-trainin…
▽ More
Recently, the pre-training of decision transformers (DT) using a different domain, such as natural language text, has generated significant attention in offline reinforcement learning (Offline RL). Although this cross-domain pre-training approach achieves superior performance compared to training from scratch in environments required short-term planning ability, the mechanisms by which pre-training benefits the fine-tuning phase remain unclear. Furthermore, we point out that the cross-domain pre-training approach hinders the extraction of distant information in environments like PointMaze that require long-term planning ability, leading to performance that is much worse than training DT from scratch. This work first analyzes these issues and found that Markov Matrix, a component that exists in pre-trained attention heads, is the key to explain the significant performance disparity of pre-trained models in different planning abilities. Inspired by our analysis, we propose a general method GPT-DTMA, which equips a pre-trained DT with Mixture of Attention (MoA), to enable adaptive learning and accommodating diverse attention requirements during fine-tuning. Extensive experiments demonstrate that the effectiveness of GPT-DTMA: it achieves superior performance in short-term environments compared to baselines, and in long-term environments, it mitigates the negative impact caused by Markov Matrix, achieving results comparable to those of DT trained from scratch.
△ Less
Submitted 10 September, 2024;
originally announced September 2024.
-
RATNUS: Rapid, Automatic Thalamic Nuclei Segmentation using Multimodal MRI inputs
Authors:
Anqi Feng,
Zhangxing Bian,
Blake E. Dewey,
Alexa Gail Colinco,
Jiachen Zhuo,
Jerry L. Prince
Abstract:
Accurate segmentation of thalamic nuclei is important for better understanding brain function and improving disease treatment. Traditional segmentation methods often rely on a single T1-weighted image, which has limited contrast in the thalamus. In this work, we introduce RATNUS, which uses synthetic T1-weighted images with many inversion times along with diffusion-derived features to enhance the…
▽ More
Accurate segmentation of thalamic nuclei is important for better understanding brain function and improving disease treatment. Traditional segmentation methods often rely on a single T1-weighted image, which has limited contrast in the thalamus. In this work, we introduce RATNUS, which uses synthetic T1-weighted images with many inversion times along with diffusion-derived features to enhance the visibility of nuclei within the thalamus. Using these features, a convolutional neural network is used to segment 13 thalamic nuclei. For comparison with other methods, we introduce a unified nuclei labeling scheme. Our results demonstrate an 87.19% average true positive rate (TPR) against manual labeling. In comparison, FreeSurfer and THOMAS achieve TPRs of 64.25% and 57.64%, respectively, demonstrating the superiority of RATNUS in thalamic nuclei segmentation.
△ Less
Submitted 10 September, 2024;
originally announced September 2024.
-
A Comprehensive Framework for Estimating Aircraft Fuel Consumption Based on Flight Trajectories
Authors:
Linfeng Zhang,
Alex Bian,
Changmin Jiang,
Lingxiao Wu
Abstract:
Accurate calculation of aircraft fuel consumption plays an irreplaceable role in flight operations, optimization, and pollutant accounting. Calculating aircraft fuel consumption accurately is tricky because it changes based on different flying conditions and physical factors. Utilizing flight surveillance data, this study developed a comprehensive mathematical framework and established a link betw…
▽ More
Accurate calculation of aircraft fuel consumption plays an irreplaceable role in flight operations, optimization, and pollutant accounting. Calculating aircraft fuel consumption accurately is tricky because it changes based on different flying conditions and physical factors. Utilizing flight surveillance data, this study developed a comprehensive mathematical framework and established a link between flight dynamics and fuel consumption, providing a set of high-precision, high-resolution fuel calculation methods. It also allows other practitioners to select data sources according to specific needs through this framework. The methodology begins by addressing the functional aspects of interval fuel consumption. We apply spectral transformation techniques to mine Automatic Dependent Surveillance-Broadcast (ADS-B) data, identifying key aspects of the flight profile and establishing their theoretical relationships with fuel consumption. Subsequently, a deep neural network with tunable parameters is used to fit this multivariate function, facilitating high-precision calculations of interval fuel consumption. Furthermore, a second-order smooth monotonic interpolation method was constructed along with a novel estimation method for instantaneous fuel consumption. Numerical results have validated the effectiveness of the model. Using ADS-B and Aircraft Communications Addressing and Reporting System (ACARS) data from 2023 for testing, the average error of interval fuel consumption can be reduced to as low as $3.31\%$, and the error in the integral sense of instantaneous fuel consumption is $8.86\%$. These results establish this model as the state of the art, achieving the lowest estimation errors in aircraft fuel consumption calculations to date.
△ Less
Submitted 10 September, 2024; v1 submitted 9 September, 2024;
originally announced September 2024.
-
Two Channels of Metal-Rich Compact Stellar System Formation: Starbursts Under High Ram Pressure vs. Tidal Stripping
Authors:
Yuan Bian,
Min Du,
Victor P. Debattista,
Dylan Nelson,
Mark A. Norris,
Luis C. Ho,
Shuai Lu,
Renyue Cen,
Shuo Ma,
Chong Ge,
Taotao Fang,
Hui Li
Abstract:
Most galaxies follow well-defined scaling relations of metallicity and stellar mass; however, some outliers at the low mass end of the observed galaxy population exhibit unusually high metallicity for their mass. Understanding how these objects get to be so metal-rich is vital for understanding the role of feedback in galaxy formation. Using the TNG50 simulation, we explore the origins of this phe…
▽ More
Most galaxies follow well-defined scaling relations of metallicity and stellar mass; however, some outliers at the low mass end of the observed galaxy population exhibit unusually high metallicity for their mass. Understanding how these objects get to be so metal-rich is vital for understanding the role of feedback in galaxy formation. Using the TNG50 simulation, we explore the origins of this phenomenon. We identify 227 metal-rich, Compact Stellar Systems (CSSs) that deviate significantly from this scaling relation. These CSSs are satellites located in the vicinity of massive host galaxies, with stellar masses ranging from $10^{8} M_{\odot}$ to $10^{10} M_{\odot}$ (including six systems that are close analogs of the M31-M32 system). Contrary to the previously assumed scenario that such objects are predominantly products of tidal stripping, our results suggest a more prevalent role for ram pressure in their formation. Indeed, 76\% (173) of these CSSs are formed through a burst of star formation occurring around the time of the first pericentric passage, typically at redshifts $z\lesssim1$, aided by strong ram pressure and tidal forces. The high ram pressure, resulting from the CSSs' rapid motion near the halo center, facilitates metal enrichment, producing high-metallicity CSSs by confining the metal-rich gas from bursty star formation, which leads to distinct stellar populations characterized by enhanced metallicity as well as high $α$-abundance. Only the remaining 24\% (54) of metal-rich CSSs are generated through the tidal stripping of massive progenitors. Our results further indicate that M32 is more likely to have formed through intense star formation events rather than through gradual, tidal stripping, thereby providing crucial insights into the nature of low mass, compact galaxy formation.
△ Less
Submitted 8 September, 2024;
originally announced September 2024.
-
Learning to Learn Transferable Generative Attack for Person Re-Identification
Authors:
Yuan Bian,
Min Liu,
Xueping Wang,
Yunfeng Ma,
Yaonan Wang
Abstract:
Deep learning-based person re-identification (re-id) models are widely employed in surveillance systems and inevitably inherit the vulnerability of deep networks to adversarial attacks. Existing attacks merely consider cross-dataset and cross-model transferability, ignoring the cross-test capability to perturb models trained in different domains. To powerfully examine the robustness of real-world…
▽ More
Deep learning-based person re-identification (re-id) models are widely employed in surveillance systems and inevitably inherit the vulnerability of deep networks to adversarial attacks. Existing attacks merely consider cross-dataset and cross-model transferability, ignoring the cross-test capability to perturb models trained in different domains. To powerfully examine the robustness of real-world re-id models, the Meta Transferable Generative Attack (MTGA) method is proposed, which adopts meta-learning optimization to promote the generative attacker producing highly transferable adversarial examples by learning comprehensively simulated transfer-based cross-model\&dataset\&test black-box meta attack tasks. Specifically, cross-model\&dataset black-box attack tasks are first mimicked by selecting different re-id models and datasets for meta-train and meta-test attack processes. As different models may focus on different feature regions, the Perturbation Random Erasing module is further devised to prevent the attacker from learning to only corrupt model-specific features. To boost the attacker learning to possess cross-test transferability, the Normalization Mix strategy is introduced to imitate diverse feature embedding spaces by mixing multi-domain statistics of target models. Extensive experiments show the superiority of MTGA, especially in cross-model\&dataset and cross-model\&dataset\&test attacks, our MTGA outperforms the SOTA methods by 21.5\% and 11.3\% on mean mAP drop rate, respectively. The code of MTGA will be released after the paper is accepted.
△ Less
Submitted 6 September, 2024;
originally announced September 2024.
-
End-to-end Multi-source Visual Prompt Tuning for Survival Analysis in Whole Slide Images
Authors:
Zhongwei Qiu,
Hanqing Chao,
Wenbin Liu,
Yixuan Shen,
Le Lu,
Ke Yan,
Dakai Jin,
Yun Bian,
Hui Jiang
Abstract:
Survival analysis using pathology images poses a considerable challenge, as it requires the localization of relevant information from the multitude of tiles within whole slide images (WSIs). Current methods typically resort to a two-stage approach, where a pre-trained network extracts features from tiles, which are then used by survival models. This process, however, does not optimize the survival…
▽ More
Survival analysis using pathology images poses a considerable challenge, as it requires the localization of relevant information from the multitude of tiles within whole slide images (WSIs). Current methods typically resort to a two-stage approach, where a pre-trained network extracts features from tiles, which are then used by survival models. This process, however, does not optimize the survival models in an end-to-end manner, and the pre-extracted features may not be ideally suited for survival prediction. To address this limitation, we present a novel end-to-end Visual Prompt Tuning framework for survival analysis, named VPTSurv. VPTSurv refines feature embeddings through an efficient encoder-decoder framework. The encoder remains fixed while the framework introduces tunable visual prompts and adaptors, thus permitting end-to-end training specifically for survival prediction by optimizing only the lightweight adaptors and the decoder. Moreover, the versatile VPTSurv framework accommodates multi-source information as prompts, thereby enriching the survival model. VPTSurv achieves substantial increases of 8.7% and 12.5% in the C-index on two immunohistochemical pathology image datasets. These significant improvements highlight the transformative potential of the end-to-end VPT framework over traditional two-stage methods.
△ Less
Submitted 5 September, 2024;
originally announced September 2024.
-
Serverless Query Processing with Flexible Performance SLAs and Prices
Authors:
Haoqiong Bian,
Dongyang Geng,
Yunpeng Chai,
Anastasia Ailamaki
Abstract:
Serverless query processing has become increasingly popular due to its auto-scaling, high elasticity, and pay-as-you-go pricing. It allows cloud data warehouse (or lakehouse) users to focus on data analysis without the burden of managing systems and resources. Accordingly, in serverless query services, users become more concerned about cost-efficiency under acceptable performance than performance…
▽ More
Serverless query processing has become increasingly popular due to its auto-scaling, high elasticity, and pay-as-you-go pricing. It allows cloud data warehouse (or lakehouse) users to focus on data analysis without the burden of managing systems and resources. Accordingly, in serverless query services, users become more concerned about cost-efficiency under acceptable performance than performance under fixed resources. This poses new challenges for serverless query engine design in providing flexible performance service-level agreements (SLAs) and cost-efficiency (i.e., prices).
In this paper, we first define the problem of flexible performance SLAs and prices in serverless query processing and discuss its significance. Then, we envision the challenges and solutions for solving this problem and the opportunities it raises for other database research. Finally, we present PixelsDB, an open-source prototype with three service levels supported by dedicated architectural designs. Evaluations show that PixelsDB reduces resource costs by 65.5% for near-real-world workloads generated by Cloud Analytics Benchmark (CAB) while not violating the pending time guarantees.
△ Less
Submitted 2 September, 2024;
originally announced September 2024.
-
Real-time Accident Anticipation for Autonomous Driving Through Monocular Depth-Enhanced 3D Modeling
Authors:
Haicheng Liao,
Yongkang Li,
Chengyue Wang,
Songning Lai,
Zhenning Li,
Zilin Bian,
Jaeyoung Lee,
Zhiyong Cui,
Guohui Zhang,
Chengzhong Xu
Abstract:
The primary goal of traffic accident anticipation is to foresee potential accidents in real time using dashcam videos, a task that is pivotal for enhancing the safety and reliability of autonomous driving technologies. In this study, we introduce an innovative framework, AccNet, which significantly advances the prediction capabilities beyond the current state-of-the-art (SOTA) 2D-based methods by…
▽ More
The primary goal of traffic accident anticipation is to foresee potential accidents in real time using dashcam videos, a task that is pivotal for enhancing the safety and reliability of autonomous driving technologies. In this study, we introduce an innovative framework, AccNet, which significantly advances the prediction capabilities beyond the current state-of-the-art (SOTA) 2D-based methods by incorporating monocular depth cues for sophisticated 3D scene modeling. Addressing the prevalent challenge of skewed data distribution in traffic accident datasets, we propose the Binary Adaptive Loss for Early Anticipation (BA-LEA). This novel loss function, together with a multi-task learning strategy, shifts the focus of the predictive model towards the critical moments preceding an accident. {We rigorously evaluate the performance of our framework on three benchmark datasets--Dashcam Accident Dataset (DAD), Car Crash Dataset (CCD), and AnAn Accident Detection (A3D), and DADA-2000 Dataset--demonstrating its superior predictive accuracy through key metrics such as Average Precision (AP) and mean Time-To-Accident (mTTA).
△ Less
Submitted 2 September, 2024;
originally announced September 2024.
-
Compositional 3D-aware Video Generation with LLM Director
Authors:
Hanxin Zhu,
Tianyu He,
Anni Tang,
Junliang Guo,
Zhibo Chen,
Jiang Bian
Abstract:
Significant progress has been made in text-to-video generation through the use of powerful generative models and large-scale internet data. However, substantial challenges remain in precisely controlling individual concepts within the generated video, such as the motion and appearance of specific characters and the movement of viewpoints. In this work, we propose a novel paradigm that generates ea…
▽ More
Significant progress has been made in text-to-video generation through the use of powerful generative models and large-scale internet data. However, substantial challenges remain in precisely controlling individual concepts within the generated video, such as the motion and appearance of specific characters and the movement of viewpoints. In this work, we propose a novel paradigm that generates each concept in 3D representation separately and then composes them with priors from Large Language Models (LLM) and 2D diffusion models. Specifically, given an input textual prompt, our scheme consists of three stages: 1) We leverage LLM as the director to first decompose the complex query into several sub-prompts that indicate individual concepts within the video~(\textit{e.g.}, scene, objects, motions), then we let LLM to invoke pre-trained expert models to obtain corresponding 3D representations of concepts. 2) To compose these representations, we prompt multi-modal LLM to produce coarse guidance on the scales and coordinates of trajectories for the objects. 3) To make the generated frames adhere to natural image distribution, we further leverage 2D diffusion priors and use Score Distillation Sampling to refine the composition. Extensive experiments demonstrate that our method can generate high-fidelity videos from text with diverse motion and flexible control over each concept. Project page: \url{https://aka.ms/c3v}.
△ Less
Submitted 31 August, 2024;
originally announced September 2024.
-
An Optimal Control Approach for Inverse Problems with Deep Learnable Regularizers
Authors:
Wanyu Bian
Abstract:
This paper introduces an optimal control framework to address the inverse problem using a learned regularizer, with applications in image reconstruction. We build upon the concept of Learnable Optimization Algorithms (LOA), which combine deep learning with traditional optimization schemes to improve convergence and stability in image reconstruction tasks such as CT and MRI. Our approach reformulat…
▽ More
This paper introduces an optimal control framework to address the inverse problem using a learned regularizer, with applications in image reconstruction. We build upon the concept of Learnable Optimization Algorithms (LOA), which combine deep learning with traditional optimization schemes to improve convergence and stability in image reconstruction tasks such as CT and MRI. Our approach reformulates the inverse problem as a variational model where the regularization term is parameterized by a deep neural network (DNN). By viewing the parameter learning process as an optimal control problem, we leverage Pontryagin's Maximum Principle (PMP) to derive necessary conditions for optimality. We propose the Method of Successive Approximations (MSA) to iteratively solve the control problem, optimizing both the DNN parameters and the reconstructed image. Additionally, we introduce an augmented reverse-state method to enhance memory efficiency without compromising the convergence guarantees of the LOA framework.
△ Less
Submitted 31 August, 2024;
originally announced September 2024.
-
Boundary driven instabilities of Couette flows
Authors:
Dongfen Bian,
Emmanuel Grenier,
Nader Masmoudi,
Weiren Zhao
Abstract:
In this article, we prove that the threshold of instability of the classical Couette flow in $H^s$ for large $s$ is $ν^{1/2}$. The instability is completely driven by the boundary. The dynamic of the flow creates a Prandtl type boundary layer of width $ν^{1/2}$ which is itself linearly unstable. This leads to a secondary instability which in turn creates a sub-layer.
In this article, we prove that the threshold of instability of the classical Couette flow in $H^s$ for large $s$ is $ν^{1/2}$. The instability is completely driven by the boundary. The dynamic of the flow creates a Prandtl type boundary layer of width $ν^{1/2}$ which is itself linearly unstable. This leads to a secondary instability which in turn creates a sub-layer.
△ Less
Submitted 30 August, 2024;
originally announced September 2024.
-
On-device Learning of EEGNet-based Network For Wearable Motor Imagery Brain-Computer Interface
Authors:
Sizhen Bian,
Pixi Kang,
Julian Moosmann,
Mengxi Liu,
Pietro Bonazzi,
Roman Rosipal,
Michele Magno
Abstract:
Electroencephalogram (EEG)-based Brain-Computer Interfaces (BCIs) have garnered significant interest across various domains, including rehabilitation and robotics. Despite advancements in neural network-based EEG decoding, maintaining performance across diverse user populations remains challenging due to feature distribution drift. This paper presents an effective approach to address this challeng…
▽ More
Electroencephalogram (EEG)-based Brain-Computer Interfaces (BCIs) have garnered significant interest across various domains, including rehabilitation and robotics. Despite advancements in neural network-based EEG decoding, maintaining performance across diverse user populations remains challenging due to feature distribution drift. This paper presents an effective approach to address this challenge by implementing a lightweight and efficient on-device learning engine for wearable motor imagery recognition. The proposed approach, applied to the well-established EEGNet architecture, enables real-time and accurate adaptation to EEG signals from unregistered users. Leveraging the newly released low-power parallel RISC-V-based processor, GAP9 from Greeenwaves, and the Physionet EEG Motor Imagery dataset, we demonstrate a remarkable accuracy gain of up to 7.31\% with respect to the baseline with a memory footprint of 15.6 KByte. Furthermore, by optimizing the input stream, we achieve enhanced real-time performance without compromising inference accuracy. Our tailored approach exhibits inference time of 14.9 ms and 0.76 mJ per single inference and 20 us and 0.83 uJ per single update during online training. These findings highlight the feasibility of our method for edge EEG devices as well as other battery-powered wearable AI systems suffering from subject-dependant feature distribution drift.
△ Less
Submitted 25 August, 2024;
originally announced September 2024.