-
Influence Functions for Scalable Data Attribution in Diffusion Models
Authors:
Bruno Mlodozeniec,
Runa Eschenhagen,
Juhan Bae,
Alexander Immer,
David Krueger,
Richard Turner
Abstract:
Diffusion models have led to significant advancements in generative modelling. Yet their widespread adoption poses challenges regarding data attribution and interpretability. In this paper, we aim to help address such challenges in diffusion models by developing an \textit{influence functions} framework. Influence function-based data attribution methods approximate how a model's output would have…
▽ More
Diffusion models have led to significant advancements in generative modelling. Yet their widespread adoption poses challenges regarding data attribution and interpretability. In this paper, we aim to help address such challenges in diffusion models by developing an \textit{influence functions} framework. Influence function-based data attribution methods approximate how a model's output would have changed if some training data were removed. In supervised learning, this is usually used for predicting how the loss on a particular example would change. For diffusion models, we focus on predicting the change in the probability of generating a particular example via several proxy measurements. We show how to formulate influence functions for such quantities and how previously proposed methods can be interpreted as particular design choices in our framework. To ensure scalability of the Hessian computations in influence functions, we systematically develop K-FAC approximations based on generalised Gauss-Newton matrices specifically tailored to diffusion models. We recast previously proposed methods as specific design choices in our framework and show that our recommended method outperforms previous data attribution approaches on common evaluations, such as the Linear Data-modelling Score (LDS) or retraining without top influences, without the need for method-specific hyperparameter tuning.
△ Less
Submitted 17 October, 2024;
originally announced October 2024.
-
Tex4D: Zero-shot 4D Scene Texturing with Video Diffusion Models
Authors:
Jingzhi Bao,
Xueting Li,
Ming-Hsuan Yang
Abstract:
3D meshes are widely used in computer vision and graphics for their efficiency in animation and minimal memory use, playing a crucial role in movies, games, AR, and VR. However, creating temporally consistent and realistic textures for mesh sequences remains labor-intensive for professional artists. On the other hand, while video diffusion models excel at text-driven video generation, they often l…
▽ More
3D meshes are widely used in computer vision and graphics for their efficiency in animation and minimal memory use, playing a crucial role in movies, games, AR, and VR. However, creating temporally consistent and realistic textures for mesh sequences remains labor-intensive for professional artists. On the other hand, while video diffusion models excel at text-driven video generation, they often lack 3D geometry awareness and struggle with achieving multi-view consistent texturing for 3D meshes. In this work, we present Tex4D, a zero-shot approach that integrates inherent 3D geometry knowledge from mesh sequences with the expressiveness of video diffusion models to produce multi-view and temporally consistent 4D textures. Given an untextured mesh sequence and a text prompt as inputs, our method enhances multi-view consistency by synchronizing the diffusion process across different views through latent aggregation in the UV space. To ensure temporal consistency, we leverage prior knowledge from a conditional video generation model for texture synthesis. However, straightforwardly combining the video diffusion model and the UV texture aggregation leads to blurry results. We analyze the underlying causes and propose a simple yet effective modification to the DDIM sampling process to address this issue. Additionally, we introduce a reference latent texture to strengthen the correlation between frames during the denoising process. To the best of our knowledge, Tex4D is the first method specifically designed for 4D scene texturing. Extensive experiments demonstrate its superiority in producing multi-view and multi-frame consistent videos based on untextured mesh sequences.
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
-
Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts
Authors:
Sukwon Yun,
Inyoung Choi,
Jie Peng,
Yangfan Wu,
Jingxuan Bao,
Qiyiwen Zhang,
Jiayi Xin,
Qi Long,
Tianlong Chen
Abstract:
Multimodal learning has gained increasing importance across various fields, offering the ability to integrate data from diverse sources such as images, text, and personalized records, which are frequently observed in medical domains. However, in scenarios where some modalities are missing, many existing frameworks struggle to accommodate arbitrary modality combinations, often relying heavily on a…
▽ More
Multimodal learning has gained increasing importance across various fields, offering the ability to integrate data from diverse sources such as images, text, and personalized records, which are frequently observed in medical domains. However, in scenarios where some modalities are missing, many existing frameworks struggle to accommodate arbitrary modality combinations, often relying heavily on a single modality or complete data. This oversight of potential modality combinations limits their applicability in real-world situations. To address this challenge, we propose Flex-MoE (Flexible Mixture-of-Experts), a new framework designed to flexibly incorporate arbitrary modality combinations while maintaining robustness to missing data. The core idea of Flex-MoE is to first address missing modalities using a new missing modality bank that integrates observed modality combinations with the corresponding missing ones. This is followed by a uniquely designed Sparse MoE framework. Specifically, Flex-MoE first trains experts using samples with all modalities to inject generalized knowledge through the generalized router ($\mathcal{G}$-Router). The $\mathcal{S}$-Router then specializes in handling fewer modality combinations by assigning the top-1 gate to the expert corresponding to the observed modality combination. We evaluate Flex-MoE on the ADNI dataset, which encompasses four modalities in the Alzheimer's Disease domain, as well as on the MIMIC-IV dataset. The results demonstrate the effectiveness of Flex-MoE highlighting its ability to model arbitrary modality combinations in diverse missing modality scenarios. Code is available at https://github.com/UNITES-Lab/flex-moe.
△ Less
Submitted 10 October, 2024;
originally announced October 2024.
-
TimeBridge: Non-Stationarity Matters for Long-term Time Series Forecasting
Authors:
Peiyuan Liu,
Beiliang Wu,
Yifan Hu,
Naiqi Li,
Tao Dai,
Jigang Bao,
Shu-tao Xia
Abstract:
Non-stationarity poses significant challenges for multivariate time series forecasting due to the inherent short-term fluctuations and long-term trends that can lead to spurious regressions or obscure essential long-term relationships. Most existing methods either eliminate or retain non-stationarity without adequately addressing its distinct impacts on short-term and long-term modeling. Eliminati…
▽ More
Non-stationarity poses significant challenges for multivariate time series forecasting due to the inherent short-term fluctuations and long-term trends that can lead to spurious regressions or obscure essential long-term relationships. Most existing methods either eliminate or retain non-stationarity without adequately addressing its distinct impacts on short-term and long-term modeling. Eliminating non-stationarity is essential for avoiding spurious regressions and capturing local dependencies in short-term modeling, while preserving it is crucial for revealing long-term cointegration across variates. In this paper, we propose TimeBridge, a novel framework designed to bridge the gap between non-stationarity and dependency modeling in long-term time series forecasting. By segmenting input series into smaller patches, TimeBridge applies Integrated Attention to mitigate short-term non-stationarity and capture stable dependencies within each variate, while Cointegrated Attention preserves non-stationarity to model long-term cointegration across variates. Extensive experiments show that TimeBridge consistently achieves state-of-the-art performance in both short-term and long-term forecasting. Additionally, TimeBridge demonstrates exceptional performance in financial forecasting on the CSI 500 and S&P 500 indices, further validating its robustness and effectiveness. Code is available at \url{https://github.com/Hank0626/TimeBridge}.
△ Less
Submitted 12 October, 2024; v1 submitted 6 October, 2024;
originally announced October 2024.
-
Clustering Alzheimer's Disease Subtypes via Similarity Learning and Graph Diffusion
Authors:
Tianyi Wei,
Shu Yang,
Davoud Ataee Tarzanagh,
Jingxuan Bao,
Jia Xu,
Patryk Orzechowski,
Joost B. Wagenaar,
Qi Long,
Li Shen
Abstract:
Alzheimer's disease (AD) is a complex neurodegenerative disorder that affects millions of people worldwide. Due to the heterogeneous nature of AD, its diagnosis and treatment pose critical challenges. Consequently, there is a growing research interest in identifying homogeneous AD subtypes that can assist in addressing these challenges in recent years. In this study, we aim to identify subtypes of…
▽ More
Alzheimer's disease (AD) is a complex neurodegenerative disorder that affects millions of people worldwide. Due to the heterogeneous nature of AD, its diagnosis and treatment pose critical challenges. Consequently, there is a growing research interest in identifying homogeneous AD subtypes that can assist in addressing these challenges in recent years. In this study, we aim to identify subtypes of AD that represent distinctive clinical features and underlying pathology by utilizing unsupervised clustering with graph diffusion and similarity learning. We adopted SIMLR, a multi-kernel similarity learning framework, and graph diffusion to perform clustering on a group of 829 patients with AD and mild cognitive impairment (MCI, a prodromal stage of AD) based on their cortical thickness measurements extracted from magnetic resonance imaging (MRI) scans. Although the clustering approach we utilized has not been explored for the task of AD subtyping before, it demonstrated significantly better performance than several commonly used clustering methods. Specifically, we showed the power of graph diffusion in reducing the effects of noise in the subtype detection. Our results revealed five subtypes that differed remarkably in their biomarkers, cognitive status, and some other clinical features. To evaluate the resultant subtypes further, a genetic association study was carried out and successfully identified potential genetic underpinnings of different AD subtypes. Our source code is available at: https://github.com/PennShenLab/AD-SIMLR.
△ Less
Submitted 4 October, 2024;
originally announced October 2024.
-
Atmospheric Pressure Ammonia Synthesis on AuRu Catalysts Enabled by Plasmon-Controlled Hydrogenation and Nitrogen-species Desorption
Authors:
Lin Yuan,
Briley B. Bourgeois,
Elijah Begin,
Yirui Zhang,
Alan X. Dai,
Zhihua Cheng,
Amy S. McKeown-Green,
Zhichen Xue,
Yi Cui,
Kun Xu,
Yu Wang,
Matthew R. Jones,
Yi Cui,
Arun Majumdar,
Junwei Lucas Bao,
Jennifer A. Dionne
Abstract:
Ammonia is a key component of fertilizer and a potential clean fuel and hydrogen carrier. The Haber-Bosch process for ammonia synthesis consumes more than half of industrial hydrogen and contributes up to ~3% of global greenhouse gas emissions. Light-driven reactions via surface plasmon resonances offer a less energy-intensive pathway for ammonia production by altering reaction intermediates. Here…
▽ More
Ammonia is a key component of fertilizer and a potential clean fuel and hydrogen carrier. The Haber-Bosch process for ammonia synthesis consumes more than half of industrial hydrogen and contributes up to ~3% of global greenhouse gas emissions. Light-driven reactions via surface plasmon resonances offer a less energy-intensive pathway for ammonia production by altering reaction intermediates. Here, we report gold-ruthenium plasmonic bimetallic alloys for ammonia synthesis at room temperature and pressure, driven by visible light. We use colloidal synthesis to create AuRu$_x$ alloys (x=0.1, 0.2, 0.3) and disperse these nanoparticles on MgO supports for gas-phase ammonia synthesis. We observe a ~60 $μ$mol/g/h reactivity and ~0.12% external quantum efficiency on a AuRu$_0$$_.$$_2$ sample under 100 mW/cm$^2$ visible light. In-situ diffuse reflective infrared Fourier transform spectroscopic measurements show that hydrogenation of nitrogen adsorbates is accelerated under light compared to thermocatalysis. Combining wavelength-dependent reactivity and spectroscopic findings with semi-classical electromagnetic modeling, we show plasmonic bimetallic alloys expedite ammonia synthesis by aiding hydrogenation of adsorbed nitrogen species via plasmon-mediated hot electrons. Quantum mechanical calculations reveal hydrogen-assisted N$_2$ splitting in the excited state is key to activating the reaction under ambient conditions. Therefore, light or H$_2$ alone cannot dissociate N$_2$ -- the key bottleneck to breaking N$_2$'s triple bond. Our findings are consistent with recent hypotheses on how nitrogenase enzymes catalyze ammonia production at mild conditions and provide insights for sustainable photochemical transformations.
△ Less
Submitted 2 October, 2024;
originally announced October 2024.
-
RoCoTex: A Robust Method for Consistent Texture Synthesis with Diffusion Models
Authors:
Jangyeong Kim,
Donggoo Kang,
Junyoung Choi,
Jeonga Wi,
Junho Gwon,
Jiun Bae,
Dumim Yoon,
Junghyun Han
Abstract:
Text-to-texture generation has recently attracted increasing attention, but existing methods often suffer from the problems of view inconsistencies, apparent seams, and misalignment between textures and the underlying mesh. In this paper, we propose a robust text-to-texture method for generating consistent and seamless textures that are well aligned with the mesh. Our method leverages state-of-the…
▽ More
Text-to-texture generation has recently attracted increasing attention, but existing methods often suffer from the problems of view inconsistencies, apparent seams, and misalignment between textures and the underlying mesh. In this paper, we propose a robust text-to-texture method for generating consistent and seamless textures that are well aligned with the mesh. Our method leverages state-of-the-art 2D diffusion models, including SDXL and multiple ControlNets, to capture structural features and intricate details in the generated textures. The method also employs a symmetrical view synthesis strategy combined with regional prompts for enhancing view consistency. Additionally, it introduces novel texture blending and soft-inpainting techniques, which significantly reduce the seam regions. Extensive experiments demonstrate that our method outperforms existing state-of-the-art methods.
△ Less
Submitted 30 September, 2024;
originally announced September 2024.
-
Atomic Higgsings of 6D SCFTs
Authors:
Jiakang Bao,
Hao Y. Zhang
Abstract:
In this paper, we study the full Higgs branch Hasse diagram for any given 6d $\mathcal{N}=(1,0)$ SCFT constructed via F-theory. This can be done by a procedure of determining all the minimal Higgsings on the generalized quiver of the 6d SCFT. We call this procedure the atomic Higgsing, which can be implemented iteratively. We present our general algorithms with many concrete examples of Hasse diag…
▽ More
In this paper, we study the full Higgs branch Hasse diagram for any given 6d $\mathcal{N}=(1,0)$ SCFT constructed via F-theory. This can be done by a procedure of determining all the minimal Higgsings on the generalized quiver of the 6d SCFT. We call this procedure the atomic Higgsing, which can be implemented iteratively. We present our general algorithms with many concrete examples of Hasse diagrams. We also compare our algorithm with the Higgsings determined by the 3d $\mathcal{N} = 4$ magnetic quivers. For the cases where the magnetic quivers are unitary, we can reproduce the full Hasse diagrams. We also construct the orthosymplectic magnetic quivers from the Type IIA brane systems for some new examples. Our approach, based on F-theory, applies to the known and new orthosymplectic cases, as well as theories that do not have known descriptions in terms of magnetic quivers. We expect our geometry-based approach to help extend the horizon of the RG flows of the 6d SCFTs.
△ Less
Submitted 25 September, 2024;
originally announced September 2024.
-
SynChart: Synthesizing Charts from Language Models
Authors:
Mengchen Liu,
Qixiu Li,
Dongdong Chen,
Dong Chen,
Jianmin Bao,
Yunsheng Li
Abstract:
With the release of GPT-4V(O), its use in generating pseudo labels for multi-modality tasks has gained significant popularity. However, it is still a secret how to build such advanced models from its base large language models (LLMs). This work explores the potential of using LLMs alone for data generation and develop competitive multi-modality models focusing on chart understanding. We construct…
▽ More
With the release of GPT-4V(O), its use in generating pseudo labels for multi-modality tasks has gained significant popularity. However, it is still a secret how to build such advanced models from its base large language models (LLMs). This work explores the potential of using LLMs alone for data generation and develop competitive multi-modality models focusing on chart understanding. We construct a large-scale chart dataset, SynChart, which contains approximately 4 million diverse chart images with over 75 million dense annotations, including data tables, code, descriptions, and question-answer sets. We trained a 4.2B chart-expert model using this dataset and achieve near-GPT-4O performance on the ChartQA task, surpassing GPT-4V.
△ Less
Submitted 24 September, 2024;
originally announced September 2024.
-
Adaptive Selection of Sampling-Reconstruction in Fourier Compressed Sensing
Authors:
Seongmin Hong,
Jaehyeok Bae,
Jongho Lee,
Se Young Chun
Abstract:
Compressed sensing (CS) has emerged to overcome the inefficiency of Nyquist sampling. However, traditional optimization-based reconstruction is slow and can not yield an exact image in practice. Deep learning-based reconstruction has been a promising alternative to optimization-based reconstruction, outperforming it in accuracy and computation speed. Finding an efficient sampling method with deep…
▽ More
Compressed sensing (CS) has emerged to overcome the inefficiency of Nyquist sampling. However, traditional optimization-based reconstruction is slow and can not yield an exact image in practice. Deep learning-based reconstruction has been a promising alternative to optimization-based reconstruction, outperforming it in accuracy and computation speed. Finding an efficient sampling method with deep learning-based reconstruction, especially for Fourier CS remains a challenge. Existing joint optimization of sampling-reconstruction works ($\mathcal{H}_1$) optimize the sampling mask but have low potential as it is not adaptive to each data point. Adaptive sampling ($\mathcal{H}_2$) has also disadvantages of difficult optimization and Pareto sub-optimality. Here, we propose a novel adaptive selection of sampling-reconstruction ($\mathcal{H}_{1.5}$) framework that selects the best sampling mask and reconstruction network for each input data. We provide theorems that our method has a higher potential than $\mathcal{H}_1$ and effectively solves the Pareto sub-optimality problem in sampling-reconstruction by using separate reconstruction networks for different sampling masks. To select the best sampling mask, we propose to quantify the high-frequency Bayesian uncertainty of the input, using a super-resolution space generation model. Our method outperforms joint optimization of sampling-reconstruction ($\mathcal{H}_1$) and adaptive sampling ($\mathcal{H}_2$) by achieving significant improvements on several Fourier CS problems.
△ Less
Submitted 18 September, 2024; v1 submitted 18 September, 2024;
originally announced September 2024.
-
ES-KT-24: A Multimodal Knowledge Tracing Benchmark Dataset with Educational Game Playing Video and Synthetic Text Generation
Authors:
Dohee Kim,
Unggi Lee,
Sookbun Lee,
Jiyeong Bae,
Taekyung Ahn,
Jaekwon Park,
Gunho Lee,
Hyeoncheol Kim
Abstract:
This paper introduces ES-KT-24, a novel multimodal Knowledge Tracing (KT) dataset for intelligent tutoring systems in educational game contexts. Although KT is crucial in adaptive learning, existing datasets often lack game-based and multimodal elements. ES-KT-24 addresses these limitations by incorporating educational game-playing videos, synthetically generated question text, and detailed game l…
▽ More
This paper introduces ES-KT-24, a novel multimodal Knowledge Tracing (KT) dataset for intelligent tutoring systems in educational game contexts. Although KT is crucial in adaptive learning, existing datasets often lack game-based and multimodal elements. ES-KT-24 addresses these limitations by incorporating educational game-playing videos, synthetically generated question text, and detailed game logs. The dataset covers Mathematics, English, Indonesian, and Malaysian subjects, emphasizing diversity and including non-English content. The synthetic text component, generated using a large language model, encompasses 28 distinct knowledge concepts and 182 questions, featuring 15,032 users and 7,782,928 interactions. Our benchmark experiments demonstrate the dataset's utility for KT research by comparing Deep learning-based KT models with Language Model-based Knowledge Tracing (LKT) approaches. Notably, LKT models showed slightly higher performance than traditional DKT models, highlighting the potential of language model-based approaches in this field. Furthermore, ES-KT-24 has the potential to significantly advance research in multimodal KT models and learning analytics. By integrating game-playing videos and detailed game logs, this dataset offers a unique approach to dissecting student learning patterns through advanced data analysis and machine-learning techniques. It has the potential to unearth new insights into the learning process and inspire further exploration in the field.
△ Less
Submitted 16 September, 2024;
originally announced September 2024.
-
Emergence of peaked singularities in the Euler-Poisson system
Authors:
Junsik Bae,
Sang-Hyuck Moon,
Kwan Woo
Abstract:
We consider the one-dimensional Euler-Poisson system equipped with the Boltzmann relation and provide the exact asymptotic behavior of the peaked solitary wave solutions near the peak. This enables us to study the cold ion limit of the peaked solitary waves with the sharp range of Hölder exponents. Furthermore, we provide numerical evidence for $C^1$ blow-up solutions to the pressureless Euler-Poi…
▽ More
We consider the one-dimensional Euler-Poisson system equipped with the Boltzmann relation and provide the exact asymptotic behavior of the peaked solitary wave solutions near the peak. This enables us to study the cold ion limit of the peaked solitary waves with the sharp range of Hölder exponents. Furthermore, we provide numerical evidence for $C^1$ blow-up solutions to the pressureless Euler-Poisson system, whose blow-up profiles are asymptotically similar to its peaked solitary waves and exhibit a different form of blow-up compared to the Burgers-type (shock-like) blow-up.
△ Less
Submitted 12 September, 2024;
originally announced September 2024.
-
Charged Higgs Boson Phenomenology in the Dark Z mediated Fermionic Dark Matter Model
Authors:
Kyu Jung Bae,
Jinn-Ouk Gong,
Dong-Won Jung,
Kang Young Lee,
Chaehyun Yu,
Chan Beom Park
Abstract:
We study the phenomenology of the charged Higgs boson, $H^\pm$,appearing in the fermionic dark matter model mediated by the dark $Z$ boson. This model is in favor of the light dark $Z$ boson, $Z'$, and the light additional neutral Higgs boson, $h$. We find that $H^\pm \to W^\pm h$ and the $H^\pm \to W^\pm Z'$ are dominant decay channels. Thus the promising final states are trilepton signals,…
▽ More
We study the phenomenology of the charged Higgs boson, $H^\pm$,appearing in the fermionic dark matter model mediated by the dark $Z$ boson. This model is in favor of the light dark $Z$ boson, $Z'$, and the light additional neutral Higgs boson, $h$. We find that $H^\pm \to W^\pm h$ and the $H^\pm \to W^\pm Z'$ are dominant decay channels. Thus the promising final states are trilepton signals, $e μμ$ or $μμμ$ following $Z' \to μ^+ μ^-$ decays and leptonic decays of the $W^\pm$ boson. The charged Higgs boson will be produced from the top quark decays $t \to b H^\pm$ following $t \bar{t}$ production, if $H^\pm$ is light. Whereas $H^\pm$ is heavier than the top quark, the dominant production processes are associated productions with either $Z'$ or $h$, $pp \to W^\star \to H^\pm h$ and $pp \to W^\star \to H^\pm Z'$. We explore the discovery potential of the charged Higgs boson at the LHC. We also discuss the implications of dark matter in relation with the charged Higgs phenomenology.
△ Less
Submitted 19 September, 2024; v1 submitted 11 September, 2024;
originally announced September 2024.
-
How to Align Large Language Models for Teaching English? Designing and Developing LLM based-Chatbot for Teaching English Conversation in EFL, Findings and Limitations
Authors:
Jaekwon Park,
Jiyoung Bae,
Unggi Lee,
Taekyung Ahn,
Sookbun Lee,
Dohee Kim,
Aram Choi,
Yeil Jeong,
Jewoong Moon,
Hyeoncheol Kim
Abstract:
This study investigates the design, development, and evaluation of a Large Language Model (LLM)-based chatbot for teaching English conversations in an English as a Foreign Language (EFL) context. Employing the Design and Development Research (DDR), we analyzed needs, established design principles, and iteratively refined a chatbot through experimenting various LLMs and alignment methods. Through b…
▽ More
This study investigates the design, development, and evaluation of a Large Language Model (LLM)-based chatbot for teaching English conversations in an English as a Foreign Language (EFL) context. Employing the Design and Development Research (DDR), we analyzed needs, established design principles, and iteratively refined a chatbot through experimenting various LLMs and alignment methods. Through both quantitative and qualitative evaluations, we identified the most effective LLM and its prompt combination to generate high-quality, contextually appropriate responses. Interviews with teachers provided insights into desirable system features, potential educational applications, and ethical considerations in the development and deployment of the chatbots. The design iterations yielded the importance of feedback mechanisms and customizable AI personas. Future research should explore adaptive feedback strategies, collaborative approaches with various stakeholders, and the integration of insights from human-computer interaction (HCI) and user experience (UX) design. This study contributes to the growing body of research on applying LLMs in language education, providing insights and recommendations for the design, development, and evaluation of LLM-based chatbots for EFL conversation practice. As the field evolves, ongoing research and collaboration among educators, AI engineers, and other stakeholders will be essential to harness the potential of these technologies to enhance language learning experiences.
△ Less
Submitted 8 September, 2024;
originally announced September 2024.
-
Report Cards: Qualitative Evaluation of Language Models Using Natural Language Summaries
Authors:
Blair Yang,
Fuyang Cui,
Keiran Paster,
Jimmy Ba,
Pashootan Vaezipoor,
Silviu Pitis,
Michael R. Zhang
Abstract:
The rapid development and dynamic nature of large language models (LLMs) make it difficult for conventional quantitative benchmarks to accurately assess their capabilities. We propose report cards, which are human-interpretable, natural language summaries of model behavior for specific skills or topics. We develop a framework to evaluate report cards based on three criteria: specificity (ability t…
▽ More
The rapid development and dynamic nature of large language models (LLMs) make it difficult for conventional quantitative benchmarks to accurately assess their capabilities. We propose report cards, which are human-interpretable, natural language summaries of model behavior for specific skills or topics. We develop a framework to evaluate report cards based on three criteria: specificity (ability to distinguish between models), faithfulness (accurate representation of model capabilities), and interpretability (clarity and relevance to humans). We also propose an iterative algorithm for generating report cards without human supervision and explore its efficacy by ablating various design choices. Through experimentation with popular LLMs, we demonstrate that report cards provide insights beyond traditional benchmarks and can help address the need for a more interpretable and holistic evaluation of LLMs.
△ Less
Submitted 1 September, 2024;
originally announced September 2024.
-
From Prediction to Application: Language Model-based Code Knowledge Tracing with Domain Adaptive Pre-Training and Automatic Feedback System with Pedagogical Prompting for Comprehensive Programming Education
Authors:
Unggi Lee,
Jiyeong Bae,
Yeonji Jung,
Minji Kang,
Gyuri Byun,
Yeonseo Lee,
Dohee Kim,
Sookbun Lee,
Jaekwon Park,
Taekyung Ahn,
Gunho Lee,
Hyeoncheol Kim
Abstract:
Knowledge Tracing (KT) is a critical component in online learning, but traditional approaches face limitations in interpretability and cross-domain adaptability. This paper introduces Language Model-based Code Knowledge Tracing (CodeLKT), an innovative application of Language model-based Knowledge Tracing (LKT) to programming education. CodeLKT leverages pre-trained language models to process lear…
▽ More
Knowledge Tracing (KT) is a critical component in online learning, but traditional approaches face limitations in interpretability and cross-domain adaptability. This paper introduces Language Model-based Code Knowledge Tracing (CodeLKT), an innovative application of Language model-based Knowledge Tracing (LKT) to programming education. CodeLKT leverages pre-trained language models to process learning data, demonstrating superior performance over existing KT and Code KT models. We explore Domain Adaptive Pre-Training (DAPT) and Task Adaptive Pre-Training (TAPT), showing enhanced performance in the coding domain and investigating cross-domain transfer between mathematics and coding. Additionally, we present an theoretically-informed integrated system combining CodeLKT with large language models to generate personalized, in-depth feedback to support students' programming learning. This work advances the field of Code Knowledge Tracing by expanding the knowledge base with language model-based approach and offering practical implications for programming education through data-informed feedback.
△ Less
Submitted 30 August, 2024;
originally announced September 2024.
-
The random periodic solutions for McKean-Vlasov stochastic differential equations
Authors:
Jianhai Bao,
Goncalo Dos Reis,
Yue Wu
Abstract:
In this paper, we study well-posedness of random periodic solutions of stochastic differential equations (SDEs) of McKean-Vlasov type driven by a two-sided Brownian motion, where the random periodic behaviour is characterised by the equations' long-time behaviour. Given the well-known connection between McKean-Vlasov SDEs and interacting particle systems, we show propagation of chaos and that the…
▽ More
In this paper, we study well-posedness of random periodic solutions of stochastic differential equations (SDEs) of McKean-Vlasov type driven by a two-sided Brownian motion, where the random periodic behaviour is characterised by the equations' long-time behaviour. Given the well-known connection between McKean-Vlasov SDEs and interacting particle systems, we show propagation of chaos and that the key properties of the interacting particle systems recover those of the McKean-Vlasov SDEs in the particle limit. All results in the present work are shown under two settings: fully and partially dissipative case. Each setting has its challenges and limitations. For instance, weakening full dissipativity to partial dissipativity demands stronger structural assumptions on the equations' dynamics and yields random periodic behaviour in the weak sense instead of pathwise sense (as in the full dissipativity case). The proof mechanisms are close but fundamentally different.
△ Less
Submitted 30 August, 2024;
originally announced August 2024.
-
Machine-learning certification of multipartite entanglement for noisy quantum hardware
Authors:
Andreas J. C. Fuchs,
Eric Brunner,
Jiheon Seong,
Hyeokjea Kwon,
Seungchan Seo,
Joonwoo Bae,
Andreas Buchleitner,
Edoardo G. Carnio
Abstract:
Entanglement is a fundamental aspect of quantum physics, both conceptually and for its many applications. Classifying an arbitrary multipartite state as entangled or separable -- a task referred to as the separability problem -- poses a significant challenge, since a state can be entangled with respect to many different of its partitions. We develop a certification pipeline that feeds the statisti…
▽ More
Entanglement is a fundamental aspect of quantum physics, both conceptually and for its many applications. Classifying an arbitrary multipartite state as entangled or separable -- a task referred to as the separability problem -- poses a significant challenge, since a state can be entangled with respect to many different of its partitions. We develop a certification pipeline that feeds the statistics of random local measurements into a non-linear dimensionality reduction algorithm, to determine with respect to which partitions a given quantum state is entangled. After training a model on randomly generated quantum states, entangled in different partitions and of varying purity, we verify the accuracy of its predictions on simulated test data, and finally apply it to states prepared on IBM quantum computing hardware.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Interactive-T2S: Multi-Turn Interactions for Text-to-SQL with Large Language Models
Authors:
Guanming Xiong,
Junwei Bao,
Hongfei Jiang,
Yang Song,
Wen Zhao
Abstract:
This study explores text-to-SQL parsing by leveraging the powerful reasoning capabilities of large language models (LLMs). Despite recent advancements, existing LLM-based methods have not adequately addressed scalability, leading to inefficiencies when processing wide tables. Furthermore, current interaction-based approaches either lack a step-by-step, interpretable SQL generation process or fail…
▽ More
This study explores text-to-SQL parsing by leveraging the powerful reasoning capabilities of large language models (LLMs). Despite recent advancements, existing LLM-based methods have not adequately addressed scalability, leading to inefficiencies when processing wide tables. Furthermore, current interaction-based approaches either lack a step-by-step, interpretable SQL generation process or fail to provide an efficient and universally applicable interaction design. To address these challenges, we introduce Interactive-T2S, a framework that generates SQL queries through direct interactions with databases. This framework includes four general tools that facilitate proactive and efficient information retrieval by the LLM. Additionally, we have developed detailed exemplars to demonstrate the step-wise reasoning processes within our framework. Our experiments on the BIRD-Dev dataset, employing a setting without oracle knowledge, reveal that our method achieves state-of-the-art results with only two exemplars, underscoring the effectiveness and robustness of our framework.
△ Less
Submitted 9 August, 2024;
originally announced August 2024.
-
An Exploratory Case Study of Query Plan Representations
Authors:
Jinsheng Ba,
Manuel Rigger
Abstract:
In database systems, a query plan is a series of concrete internal steps to execute a query. Multiple testing approaches utilize query plans for finding bugs. However, query plans are represented in a database-specific manner, so implementing these testing approaches requires a non-trivial effort, hindering their adoption. We envision that a unified query plan representation can facilitate the imp…
▽ More
In database systems, a query plan is a series of concrete internal steps to execute a query. Multiple testing approaches utilize query plans for finding bugs. However, query plans are represented in a database-specific manner, so implementing these testing approaches requires a non-trivial effort, hindering their adoption. We envision that a unified query plan representation can facilitate the implementation of these approaches. In this paper, we present an exploratory case study to investigate query plan representations in nine widely-used database systems. Our study shows that query plan representations consist of three conceptual components: operations, properties, and formats, which enable us to design a unified query plan representation. Based on it, existing testing methods can be efficiently adopted, finding 17 previously unknown and unique bugs. Additionally, the unified query plan representation can facilitate other applications. Existing visualization tools can support multiple database systems based on the unified query plan representation with moderate implementation effort, and comparing unified query plans across database systems provides actionable insights to improve their performance. We expect that the unified query plan representation will enable the exploration of additional application scenarios.
△ Less
Submitted 15 August, 2024; v1 submitted 14 August, 2024;
originally announced August 2024.
-
Rethinking Open-Vocabulary Segmentation of Radiance Fields in 3D Space
Authors:
Hyunjee Lee,
Youngsik Yun,
Jeongmin Bae,
Seoha Kim,
Youngjung Uh
Abstract:
Understanding the 3D semantics of a scene is a fundamental problem for various scenarios such as embodied agents. While NeRFs and 3DGS excel at novel-view synthesis, previous methods for understanding their semantics have been limited to incomplete 3D understanding: their segmentation results are 2D masks and their supervision is anchored at 2D pixels. This paper revisits the problem set to pursue…
▽ More
Understanding the 3D semantics of a scene is a fundamental problem for various scenarios such as embodied agents. While NeRFs and 3DGS excel at novel-view synthesis, previous methods for understanding their semantics have been limited to incomplete 3D understanding: their segmentation results are 2D masks and their supervision is anchored at 2D pixels. This paper revisits the problem set to pursue a better 3D understanding of a scene modeled by NeRFs and 3DGS as follows. 1) We directly supervise the 3D points to train the language embedding field. It achieves state-of-the-art accuracy without relying on multi-scale language embeddings. 2) We transfer the pre-trained language field to 3DGS, achieving the first real-time rendering speed without sacrificing training time or accuracy. 3) We introduce a 3D querying and evaluation protocol for assessing the reconstructed geometry and semantics together. Code, checkpoints, and annotations will be available online. Project page: https://hyunji12.github.io/Open3DRF
△ Less
Submitted 18 August, 2024; v1 submitted 14 August, 2024;
originally announced August 2024.
-
Photometric Inverse Rendering: Shading Cues Modeling and Surface Reflectance Regularization
Authors:
Jingzhi Bao,
Guanying Chen,
Shuguang Cui
Abstract:
This paper addresses the problem of inverse rendering from photometric images. Existing approaches for this problem suffer from the effects of self-shadows, inter-reflections, and lack of constraints on the surface reflectance, leading to inaccurate decomposition of reflectance and illumination due to the ill-posed nature of inverse rendering. In this work, we propose a new method for neural inver…
▽ More
This paper addresses the problem of inverse rendering from photometric images. Existing approaches for this problem suffer from the effects of self-shadows, inter-reflections, and lack of constraints on the surface reflectance, leading to inaccurate decomposition of reflectance and illumination due to the ill-posed nature of inverse rendering. In this work, we propose a new method for neural inverse rendering. Our method jointly optimizes the light source position to account for the self-shadows in images, and computes indirect illumination using a differentiable rendering layer and an importance sampling strategy. To enhance surface reflectance decomposition, we introduce a new regularization by distilling DINO features to foster accurate and consistent material decomposition. Extensive experiments on synthetic and real datasets demonstrate that our method outperforms the state-of-the-art methods in reflectance decomposition.
△ Less
Submitted 13 August, 2024;
originally announced August 2024.
-
Beyond Closure Models: Learning Chaotic-Systems via Physics-Informed Neural Operators
Authors:
Chuwei Wang,
Julius Berner,
Zongyi Li,
Di Zhou,
Jiayun Wang,
Jane Bae,
Anima Anandkumar
Abstract:
Accurately predicting the long-term behavior of chaotic systems is crucial for various applications such as climate modeling. However, achieving such predictions typically requires iterative computations over a dense spatiotemporal grid to account for the unstable nature of chaotic systems, which is expensive and impractical in many real-world situations. An alternative approach to such a full-res…
▽ More
Accurately predicting the long-term behavior of chaotic systems is crucial for various applications such as climate modeling. However, achieving such predictions typically requires iterative computations over a dense spatiotemporal grid to account for the unstable nature of chaotic systems, which is expensive and impractical in many real-world situations. An alternative approach to such a full-resolved simulation is using a coarse grid and then correcting its errors through a \textit{closure model}, which approximates the overall information from fine scales not captured in the coarse-grid simulation. Recently, ML approaches have been used for closure modeling, but they typically require a large number of training samples from expensive fully-resolved simulations (FRS). In this work, we prove an even more fundamental limitation, i.e., the standard approach to learning closure models suffers from a large approximation error for generic problems, no matter how large the model is, and it stems from the non-uniqueness of the mapping. We propose an alternative end-to-end learning approach using a physics-informed neural operator (PINO) that overcomes this limitation by not using a closure model or a coarse-grid solver. We first train the PINO model on data from a coarse-grid solver and then fine-tune it with (a small amount of) FRS and physics-based losses on a fine grid. The discretization-free nature of neural operators means that they do not suffer from the restriction of a coarse grid that closure models face, and they can provably approximate the long-term statistics of chaotic systems. In our experiments, our PINO model achieves a 330x speedup compared to FRS with a relative error $\sim 10\%$. In contrast, the closure model coupled with a coarse-grid solver is $60$x slower than PINO while having a much higher error $\sim186\%$ when the closure model is trained on the same FRS dataset.
△ Less
Submitted 9 October, 2024; v1 submitted 9 August, 2024;
originally announced August 2024.
-
A Survey of Protoplanetary Disks Using the Keck/NIRC2 Vortex Coronagraph
Authors:
Nicole L. Wallack,
Jean-Baptiste Ruffio,
Garreth Ruane,
Bin B. Ren,
Jerry W. Xuan,
Marion Villenave,
Dimitri Mawet,
Karl Stapelfeldt,
Jason J. Wang,
Michael C. Liu,
Olivier Absil,
Carlos Alvarez,
Jaehan Bae,
Charlotte Bond,
Michael Bottom,
Benjamin Calvin,
Élodie Choquet,
Valentin Christiaens,
Therese Cook,
Bruno Femenía Castellá,
Carlos Gomez Gonzalez,
Greta Guidi,
Elsa Huby,
Joel Kastner,
Heather A. Knutson
, et al. (12 additional authors not shown)
Abstract:
Recent Atacama Large Millimeter/submillimeter Array (ALMA) observations of protoplanetary disks in the millimeter continuum have shown a variety of radial gaps, cavities, and spiral features. These substructures may be signposts for ongoing planet formation, and therefore these systems are promising targets for direct imaging planet searches in the near-infrared. To this end, we present results fr…
▽ More
Recent Atacama Large Millimeter/submillimeter Array (ALMA) observations of protoplanetary disks in the millimeter continuum have shown a variety of radial gaps, cavities, and spiral features. These substructures may be signposts for ongoing planet formation, and therefore these systems are promising targets for direct imaging planet searches in the near-infrared. To this end, we present results from a deep imaging survey in the $L'$-band (3.8 $μ$m) with the Keck/NIRC2 vortex coronagraph to search for young planets in 43 disks with resolved features in the millimeter continuum or evidence for gaps/central cavities from their spectral energy distributions. Although we do not detect any new point sources, using the vortex coronagraph allows for high sensitivity to faint sources at small angular separations (down to ${\sim}$0$^{\prime\prime}$.1), allowing us to place strong upper limits on the masses of potential gas giant planets. We compare our mass sensitivities to the masses of planets derived using ALMA observations, and while we are sensitive to $\sim$1 M$_{Jup}$ planets in the gaps in some of our systems, we are generally not sensitive to planets of the masses expected from the ALMA observations. In addition to placing upper limits on the masses of gas giant planets that could be interacting with the dust in the disks to form the observed millimeter substructures, we are also able to map the micron-sized dust as seen in scattered light for 8 of these systems. Our large sample of systems also allows us to investigate limits on planetary accretion rates and disk viscosities.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
Decomposed Prompting to Answer Questions on a Course Discussion Board
Authors:
Brandon Jaipersaud,
Paul Zhang,
Jimmy Ba,
Andrew Petersen,
Lisa Zhang,
Michael R. Zhang
Abstract:
We propose and evaluate a question-answering system that uses decomposed prompting to classify and answer student questions on a course discussion board. Our system uses a large language model (LLM) to classify questions into one of four types: conceptual, homework, logistics, and not answerable. This enables us to employ a different strategy for answering questions that fall under different types…
▽ More
We propose and evaluate a question-answering system that uses decomposed prompting to classify and answer student questions on a course discussion board. Our system uses a large language model (LLM) to classify questions into one of four types: conceptual, homework, logistics, and not answerable. This enables us to employ a different strategy for answering questions that fall under different types. Using a variant of GPT-3, we achieve $81\%$ classification accuracy. We discuss our system's performance on answering conceptual questions from a machine learning course and various failure modes.
△ Less
Submitted 30 July, 2024;
originally announced July 2024.
-
Magnon Spectra of Cuprates beyond Spin Wave Theory
Authors:
Jiahui Bao,
Matthias Gohlke,
Jeffrey G. Rau,
Nic Shannon
Abstract:
The usual starting point for understanding magnons in cuprate antiferromagnets such as La$_2$CuO$_4$ is a spin model incorporating cyclic exchange, which descends from a one-band Hubbard model, and has parameters taken from fits based on non-interacting spin wave theory. Here we explore whether this provides a reliable description of experiment, using matrix product states (MPS) to calculate magno…
▽ More
The usual starting point for understanding magnons in cuprate antiferromagnets such as La$_2$CuO$_4$ is a spin model incorporating cyclic exchange, which descends from a one-band Hubbard model, and has parameters taken from fits based on non-interacting spin wave theory. Here we explore whether this provides a reliable description of experiment, using matrix product states (MPS) to calculate magnon spectra beyond spin wave theory. We find that analysis based on low orders of spin wave theory leads to systematic overestimates of exchange parameters, with corresponding errors in estimates of Hubbard $t/U$. Once these are corrected, the ''standard'' model provides a good account of magnon dispersion and lineshape in La$_2$CuO$_4$, but fails to fully capture the continuum observed at high energies. The extension of this analysis to CaCuO$_2$ and Sr$_2$IrO$_4$ is also discussed.
△ Less
Submitted 29 July, 2024;
originally announced July 2024.
-
Unraveling the role of Ta in the phase transition of Pb(Ta1+xSe2)2 using low-temperature Raman spectroscopy
Authors:
Yu Ma,
Chi Sin Tang,
Xiaohui Yang,
Yi Wei Ho,
Jun Zhou,
Wenjun Wu,
Shuo Sun,
Jin-Ke Bao,
Dingguan Wang,
Xiao Lin,
Magdalena Grzeszczyk,
Shijie Wang,
Mark B H Breese,
Chuanbing Cai,
Andrew T. S. Wee,
Maciej Koperski,
Zhu-An Xu,
Xinmao Yin
Abstract:
Phase engineering strategies in two-dimensional transition metal dichalcogenides (2D-TMDs) have garnered significant attention due to their potential applications in electronics, optoelectronics, and energy storage. Various methods, including direct synthesis, pressure control, and chemical doping, have been employed to manipulate structural transitions in 2D-TMDs. Metal intercalation emerges as a…
▽ More
Phase engineering strategies in two-dimensional transition metal dichalcogenides (2D-TMDs) have garnered significant attention due to their potential applications in electronics, optoelectronics, and energy storage. Various methods, including direct synthesis, pressure control, and chemical doping, have been employed to manipulate structural transitions in 2D-TMDs. Metal intercalation emerges as an effective technique to modulate phase transition dynamics by inserting external atoms or ions between the layers of 2D-TMDs, altering their electronic structure and physical properties. Here, we investigate the significant structural phase transitions in Pb(Ta1+xSe2)2 single crystals induced by Ta intercalation using a combination of Raman spectroscopy and first-principles calculations. The results highlight the pivotal role of Ta atoms in driving these transitions and elucidate the interplay between intercalation, phase transitions, and resulting electronic and vibrational properties in 2D-TMDs. By focusing on Pb(Ta1+xSe2)2 as an ideal case study and investigating like metal intercalation, this study advances understanding in the field and paves the way for the development of novel applications for 2D-TMDs, offering insights into the potential of these materials for future technological advancements.
△ Less
Submitted 8 August, 2024; v1 submitted 28 July, 2024;
originally announced July 2024.
-
MVPbev: Multi-view Perspective Image Generation from BEV with Test-time Controllability and Generalizability
Authors:
Buyu Liu,
Kai Wang,
Yansong Liu,
Jun Bao,
Tingting Han,
Jun Yu
Abstract:
This work aims to address the multi-view perspective RGB generation from text prompts given Bird-Eye-View(BEV) semantics. Unlike prior methods that neglect layout consistency, lack the ability to handle detailed text prompts, or are incapable of generalizing to unseen view points, MVPbev simultaneously generates cross-view consistent images of different perspective views with a two-stage design, a…
▽ More
This work aims to address the multi-view perspective RGB generation from text prompts given Bird-Eye-View(BEV) semantics. Unlike prior methods that neglect layout consistency, lack the ability to handle detailed text prompts, or are incapable of generalizing to unseen view points, MVPbev simultaneously generates cross-view consistent images of different perspective views with a two-stage design, allowing object-level control and novel view generation at test-time. Specifically, MVPbev firstly projects given BEV semantics to perspective view with camera parameters, empowering the model to generalize to unseen view points. Then we introduce a multi-view attention module where special initialization and de-noising processes are introduced to explicitly enforce local consistency among overlapping views w.r.t. cross-view homography. Last but not least, MVPbev further allows test-time instance-level controllability by refining a pre-trained text-to-image diffusion model. Our extensive experiments on NuScenes demonstrate that our method is capable of generating high-resolution photorealistic images from text descriptions with thousands of training samples, surpassing the state-of-the-art methods under various evaluation metrics. We further demonstrate the advances of our method in terms of generalizability and controllability with the help of novel evaluation metrics and comprehensive human analysis. Our code, data, and model can be found in \url{https://github.com/kkaiwwana/MVPbev}.
△ Less
Submitted 28 July, 2024;
originally announced July 2024.
-
Singularity formation of hydromagnetic waves in cold plasma
Authors:
Junsik Bae,
Junho Choi,
Bongsuk Kwon
Abstract:
We study $C^1$ blow-up of the compressible fluid model introduced by Gardner and Morikawa, which describes the dynamics of a magnetized cold plasma. We propose sufficient conditions that lead to $C^1$ blow-up. In particular, we find that smooth solutions can break down in finite time even if the gradient of initial velocity is identically zero. The density and the gradient of the velocity become u…
▽ More
We study $C^1$ blow-up of the compressible fluid model introduced by Gardner and Morikawa, which describes the dynamics of a magnetized cold plasma. We propose sufficient conditions that lead to $C^1$ blow-up. In particular, we find that smooth solutions can break down in finite time even if the gradient of initial velocity is identically zero. The density and the gradient of the velocity become unbounded as time approaches the lifespan of the smooth solution. The Lagrangian formulation reduces the singularity formation problem to finding a zero of the associated second-order ODE.
△ Less
Submitted 26 July, 2024;
originally announced July 2024.
-
Delta-shock for the pressureless Euler-Poisson system
Authors:
Junsik Bae,
Yunjoo Kim,
Bongsuk Kwon
Abstract:
We study singularity formation for the pressureless Euler-Poisson system of cold ion dynamics. In contrast to the Euler-Poisson system with pressure, when its smooth solutions experience $C^1$ blow-up, the $L^\infty$ norm of the density becomes unbounded, which is often referred to as a delta-shock. We provide a constructive proof of singularity formation to obtain an exact blow-up profile and the…
▽ More
We study singularity formation for the pressureless Euler-Poisson system of cold ion dynamics. In contrast to the Euler-Poisson system with pressure, when its smooth solutions experience $C^1$ blow-up, the $L^\infty$ norm of the density becomes unbounded, which is often referred to as a delta-shock. We provide a constructive proof of singularity formation to obtain an exact blow-up profile and the detailed asymptotic behavior of the solutions near the blow-up point in both time and space. Our result indicates that at the blow-up time $t=T_\ast$, the density function is unbounded but is locally integrable with the profile of $ρ(x,T_\ast) \sim (x-x_*)^{-2/3}$ near the blow-up point $x=x_\ast$. This profile is not yet a Dirac measure. On the other hand, the velocity function has $C^{1/3}$ regularity at the blow-up point. Loosely following our analysis, we also obtain an exact blow-up profile for the pressureless Euler equations.
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
Partial Difference Sets with Denniston Parameters in Elementary Abelian $p$-Groups
Authors:
Jingjun Bao,
Qing Xiang,
Meng Zhao
Abstract:
Denniston \cite{D1969} constructed partial difference sets (PDS) with parameters $(2^{3m}, (2^{m+r}-2^m+2^r)(2^m-1), 2^m-2^r+(2^{m+r}-2^m+2^r)(2^r-2), (2^{m+r}-2^m+2^r)(2^r-1))$ in elementary abelian groups of order $2^{3m}$ for all $m\geq 2$ and $1 \leq r < m$. These PDS correspond to maximal arcs in the Desarguesian projective planes PG$(2, 2^m)$. Davis et al. \cite{DHJP2024} and also De Winter…
▽ More
Denniston \cite{D1969} constructed partial difference sets (PDS) with parameters $(2^{3m}, (2^{m+r}-2^m+2^r)(2^m-1), 2^m-2^r+(2^{m+r}-2^m+2^r)(2^r-2), (2^{m+r}-2^m+2^r)(2^r-1))$ in elementary abelian groups of order $2^{3m}$ for all $m\geq 2$ and $1 \leq r < m$. These PDS correspond to maximal arcs in the Desarguesian projective planes PG$(2, 2^m)$. Davis et al. \cite{DHJP2024} and also De Winter \cite{dewinter23} presented constructions of PDS with Denniston parameters $(p^{3m}, (p^{m+r}-p^m+p^r)(p^m-1), p^m-p^r+(p^{m+r}-p^m+p^r)(p^r-2), (p^{m+r}-p^m+p^r)(p^r-1))$ in elementary abelian groups of order $p^{3m}$ for all $m \geq 2$ and $r \in \{1, m-1\}$, where $p$ is an odd prime. The constructions in \cite{DHJP2024, dewinter23} are particularly intriguing, as it was shown by Ball, Blokhuis, and Mazzocca \cite{BBM1997} that no nontrivial maximal arcs in PG$(2, q^m)$ exist for any odd prime power $q$. In this paper, we show that PDS with Denniston parameters $(q^{3m}, (q^{m+r}-q^m+q^r)(q^m-1), q^m-q^r+(q^{m+r}-q^m+q^r)(q^r-2), (q^{m+r}-q^m+q^r)(q^r-1))$ exist in elementary abelian groups of order $q^{3m}$ for all $m \geq 2$ and $1 \leq r < m$, where $q$ is an arbitrary prime power.
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
Overview of AI-Debater 2023: The Challenges of Argument Generation Tasks
Authors:
Jiayu Lin,
Guanrong Chen,
Bojun Jin,
Chenyang Li,
Shutong Jia,
Wancong Lin,
Yang Sun,
Yuhang He,
Caihua Yang,
Jianzhu Bao,
Jipeng Wu,
Wen Su,
Jinglu Chen,
Xinyi Li,
Tianyu Chen,
Mingjie Han,
Shuaiwen Du,
Zijian Wang,
Jiyin Li,
Fuzhong Suo,
Hao Wang,
Nuanchen Lin,
Xuanjing Huang,
Changjian Jiang,
RuiFeng Xu
, et al. (4 additional authors not shown)
Abstract:
In this paper we present the results of the AI-Debater 2023 Challenge held by the Chinese Conference on Affect Computing (CCAC 2023), and introduce the related datasets. We organize two tracks to handle the argumentative generation tasks in different scenarios, namely, Counter-Argument Generation (Track 1) and Claim-based Argument Generation (Track 2). Each track is equipped with its distinct data…
▽ More
In this paper we present the results of the AI-Debater 2023 Challenge held by the Chinese Conference on Affect Computing (CCAC 2023), and introduce the related datasets. We organize two tracks to handle the argumentative generation tasks in different scenarios, namely, Counter-Argument Generation (Track 1) and Claim-based Argument Generation (Track 2). Each track is equipped with its distinct dataset and baseline model respectively. In total, 32 competing teams register for the challenge, from which we received 11 successful submissions. In this paper, we will present the results of the challenge and a summary of the systems, highlighting commonalities and innovations among participating systems. Datasets and baseline models of the AI-Debater 2023 Challenge have been already released and can be accessed through the official website of the challenge.
△ Less
Submitted 24 July, 2024; v1 submitted 20 July, 2024;
originally announced July 2024.
-
Interim report for the International Muon Collider Collaboration (IMCC)
Authors:
C. Accettura,
S. Adrian,
R. Agarwal,
C. Ahdida,
C. Aimé,
A. Aksoy,
G. L. Alberghi,
S. Alden,
N. Amapane,
D. Amorim,
P. Andreetto,
F. Anulli,
R. Appleby,
A. Apresyan,
P. Asadi,
M. Attia Mahmoud,
B. Auchmann,
J. Back,
A. Badea,
K. J. Bae,
E. J. Bahng,
L. Balconi,
F. Balli,
L. Bandiera,
C. Barbagallo
, et al. (362 additional authors not shown)
Abstract:
The International Muon Collider Collaboration (IMCC) [1] was established in 2020 following the recommendations of the European Strategy for Particle Physics (ESPP) and the implementation of the European Strategy for Particle Physics-Accelerator R&D Roadmap by the Laboratory Directors Group [2], hereinafter referred to as the the European LDG roadmap. The Muon Collider Study (MuC) covers the accele…
▽ More
The International Muon Collider Collaboration (IMCC) [1] was established in 2020 following the recommendations of the European Strategy for Particle Physics (ESPP) and the implementation of the European Strategy for Particle Physics-Accelerator R&D Roadmap by the Laboratory Directors Group [2], hereinafter referred to as the the European LDG roadmap. The Muon Collider Study (MuC) covers the accelerator complex, detectors and physics for a future muon collider. In 2023, European Commission support was obtained for a design study of a muon collider (MuCol) [3]. This project started on 1st March 2023, with work-packages aligned with the overall muon collider studies. In preparation of and during the 2021-22 U.S. Snowmass process, the muon collider project parameters, technical studies and physics performance studies were performed and presented in great detail. Recently, the P5 panel [4] in the U.S. recommended a muon collider R&D, proposed to join the IMCC and envisages that the U.S. should prepare to host a muon collider, calling this their "muon shot". In the past, the U.S. Muon Accelerator Programme (MAP) [5] has been instrumental in studies of concepts and technologies for a muon collider.
△ Less
Submitted 17 July, 2024;
originally announced July 2024.
-
In-situ local imaging of ferromagnetism and superconductivity in RbEuFe$_4$As$_4$
Authors:
Huiyuan Man,
Yusuke Iguchi,
Jin-Ke Bao,
Duck Young Chung,
Mercouri G. Kanatzidis
Abstract:
The coexistence of superconductivity and ferromagnetism is an intrinsically interesting research focus in condensed matter physics but the study is limited by low superconducting ($T_c$) and magnetic ($T_m$) transition temperatures in related materials. Here, we used a scanning superconducting quantum interference device to image the in-situ diamagnetic and ferromagnetic responses of RbEuFe$_4$As…
▽ More
The coexistence of superconductivity and ferromagnetism is an intrinsically interesting research focus in condensed matter physics but the study is limited by low superconducting ($T_c$) and magnetic ($T_m$) transition temperatures in related materials. Here, we used a scanning superconducting quantum interference device to image the in-situ diamagnetic and ferromagnetic responses of RbEuFe$_4$As$_4$ with high $T_c$ and $T_m$. We observed significant suppression of superfluid density in vicinity of the magnetic phase transition, signifying fluctuation-enhanced magnetic scatterings between Eu spins and Fe 3$d$ conduction electrons. Intriguingly, we observed multiple ferromagnetic domains which should be absent in an ideal magnetic helical phase. The formation of these domains demonstrates a weak $c$-axis ferromagnetic component probably arising from Eu spin-canting effect, indicative of possible superconductivity-driven domain Meissner and domain vortex-antivortex phases as revealed in EuFe$_2$(As$_{0.79}$P$_{0.21}$)$_2$. Our observations highlight RbEuFe$_4$As$_4$ is a unique system which includes multiple interplay channels between superconductivity and ferromagnetism.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Global destabilization of drift-tearing mode with coupling to discretized electron drift-wave instability
Authors:
J. Bao,
W. L. Zhang,
Z. Lin,
H. S. Cai,
D. J. Liu,
H. T. Chen,
C. Dong,
J. T. Cao,
D. Li
Abstract:
The global linear behaviors of 2/1 DTM in the collisional regime are investigated based on a concisely resistive drift-MHD model. Besides DTM, extra normal modes including EDW and SAW are coupled together and destabilized in different parameter regimes by considering resistivity in this system. The EVP approach is applied for solving the eigenstate spectra with the distribution of all unstable sol…
▽ More
The global linear behaviors of 2/1 DTM in the collisional regime are investigated based on a concisely resistive drift-MHD model. Besides DTM, extra normal modes including EDW and SAW are coupled together and destabilized in different parameter regimes by considering resistivity in this system. The EVP approach is applied for solving the eigenstate spectra with the distribution of all unstable solutions. It is found that in the small EDD frequency (omega_*e) regime, DTM growth rate agrees well with local theory that is reduced with increasing omega_*e. However, when omega_*e exceeds a critical threshold omega_*crit, the strongly linear coupling between DTM and other discretized EDW instabilities happens so that the free energies from current and pressure channels can be released together and thus enhance the DTM, of which growth rate increases with increasing omega_*e and deviates from local theory results qualitatively. Correspondingly, a cross-scale mode structure forms with mixed polarization, namely, phi perturbation is dominated by electrostatic polarized short-wavelength oscillation as EDW instability character, and A_para perturbation remains typical tearing mode solution of Alfvenic polarized macroscopic structure. Within omega_*e > omega_*crit, the additional IDD causes phi oscillating structure to shift towards small density gradient domain, which cancels the extra drive from ion channel and thus DTM growth rate is insensitive to IDD frequency. Compared to EDD effects, the IDD effect alone with zero-omega_*e only leads to the stabilization of RTM that shows agreements between global simulation and local theory, which is no longer the condition for DTM regime. These results are useful for clarifying the DTM global properties with underlying physics mechanisms, which occurs in the regime of omega_*e >> gamma_c that is relevant to nowadays tokamak discharges with hot plasmas.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
TLDR: Unsupervised Goal-Conditioned RL via Temporal Distance-Aware Representations
Authors:
Junik Bae,
Kwanyoung Park,
Youngwoon Lee
Abstract:
Unsupervised goal-conditioned reinforcement learning (GCRL) is a promising paradigm for developing diverse robotic skills without external supervision. However, existing unsupervised GCRL methods often struggle to cover a wide range of states in complex environments due to their limited exploration and sparse or noisy rewards for GCRL. To overcome these challenges, we propose a novel unsupervised…
▽ More
Unsupervised goal-conditioned reinforcement learning (GCRL) is a promising paradigm for developing diverse robotic skills without external supervision. However, existing unsupervised GCRL methods often struggle to cover a wide range of states in complex environments due to their limited exploration and sparse or noisy rewards for GCRL. To overcome these challenges, we propose a novel unsupervised GCRL method that leverages TemporaL Distance-aware Representations (TLDR). TLDR selects faraway goals to initiate exploration and computes intrinsic exploration rewards and goal-reaching rewards, based on temporal distance. Specifically, our exploration policy seeks states with large temporal distances (i.e. covering a large state space), while the goal-conditioned policy learns to minimize the temporal distance to the goal (i.e. reaching the goal). Our experimental results in six simulated robotic locomotion environments demonstrate that our method significantly outperforms previous unsupervised GCRL methods in achieving a wide variety of states.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
ALMA high-resolution observations unveil planet formation shaping molecular emission in the PDS 70 disk
Authors:
L. Rampinelli,
S. Facchini,
M. Leemker,
J. Bae,
M. Benisty,
R. Teague,
C. J. Law,
K. I. Öberg,
B. Portilla-Revelo,
A. J. Cridland
Abstract:
With two directly detected protoplanets, the PDS 70 system is a unique source in which to study the complex interplay between forming planets and their natal environment. The large dust cavity carved by the two giant planets can affect the disk chemistry, and therefore the molecular emission morphology. On the other hand, chemical properties of the gas component of the disk are expected to leave a…
▽ More
With two directly detected protoplanets, the PDS 70 system is a unique source in which to study the complex interplay between forming planets and their natal environment. The large dust cavity carved by the two giant planets can affect the disk chemistry, and therefore the molecular emission morphology. On the other hand, chemical properties of the gas component of the disk are expected to leave an imprint on the planetary atmospheres. In this work, we reconstruct the emission morphology of a rich inventory of molecular tracers in the PDS 70 disk, and we look for possible chemical signatures of the two actively accreting protoplanets, PDS b and c. We leverage Atacama Large Millimeter/submillimeter Array (ALMA) band 6 high-angular-resolution and deep-sensitivity line emission observations, together with image and $uv$-plane techniques, to boost the detection of faint lines. We robustly detect ring-shaped emission from $^{12}$CO, $^{13}$CO, C$^{18}$O, H$^{13}$CN, HC$^{15}$N, DCN, H$_2$CO, CS, C$_2$H, and H$^{13}$CO$^{+}$ lines in unprecedented detail. Most of the molecular tracers show a peak of the emission inside the millimeter dust peak. We interpret this as the direct impact of the effective irradiation of the cavity wall, as a result of the planet formation process. Moreover, we have found evidence of an O-poor gas reservoir in the outer disk, which is supported by the observations of bright C-rich molecules, the non-detection of SO, and a lower limit on the $\mathrm{CS/SO}$ ratio of $\sim1$. Eventually, we provide the first detection of the c-C$_3$H$_2$ transitions at 218.73 GHz, and the marginal detection of an azimuthal asymmetry in the higher-energy H$_2$CO (3$_{2,1}$-2$_{2,0}$) line, which could be due to accretion heating near PDS 70b.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Comparison of Short-Range Order in GeSn Grown by Molecular Beam Epitaxy and Chemical Vapor Deposition
Authors:
Shang Liu,
Yunfan Liang,
Haochen Zhao,
Nirosh M. Eldose,
Jin-Hee Bae,
Omar Concepcion,
Xiaochen Jin,
Shunda Chen,
Ilias Bikmukhametov,
Austin Akey,
Cory T. Cline,
Alejandra Cuervo Covian,
Xiaoxin Wang,
Tianshu Li,
Yuping Zeng,
Dan Buca,
Shui-Qing Yu,
Gregory J. Salamo,
Shengbai Zhang,
Jifeng Liu
Abstract:
Atomic short-range order (SRO) in direct-bandgap GeSn for infrared photonics has recently attracted attention due to its notable impact on band structures. However, the SRO in GeSn thin films grown by different methods have hardly been compared. This paper compares SRO in GeSn thin films of similar compositions grown by molecular beam epitaxy (MBE) and chemical vapor deposition (CVD) using atom pr…
▽ More
Atomic short-range order (SRO) in direct-bandgap GeSn for infrared photonics has recently attracted attention due to its notable impact on band structures. However, the SRO in GeSn thin films grown by different methods have hardly been compared. This paper compares SRO in GeSn thin films of similar compositions grown by molecular beam epitaxy (MBE) and chemical vapor deposition (CVD) using atom probe tomography. An $\sim$15% stronger preference for Sn-Sn 1$^{st}$ nearest neighbor (1NN) is observed in MBE GeSn than CVD GeSn for both thin film and quantum well samples with Sn composition ranging from 7 to 20%. Interestingly, samples grown by different deposition tools under the same method (either MBE or CVD) showed remarkable consistency in Sn-Sn 1NN SRO, while MBE vs. CVD showed clear differences. Supported by theoretical modeling, we consider that this difference in SRO originates from the impact of surface termination, where MBE surfaces are exposed to ultrahigh vacuum while CVD surfaces are terminated by H to a good extent. This finding not only suggests engineering surface termination or surfactants during the growth as a potential approach to control SRO in GeSn, but also provides insight into the underlying reasons for very different growth temperature between MBE and CVD that directly impact the strain relaxation behavior.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Constraints on the gas-phase C/O ratio of DR Tau's outer disk from CS, SO, and C$_2$H observations
Authors:
Jane Huang,
Edwin A. Bergin,
Romane Le Gal,
Sean M. Andrews,
Jaehan Bae,
Luke Keyte,
J. A. Sturm
Abstract:
Millimeter wavelength observations of Class II protoplanetary disks often display strong emission from hydrocarbons and high CS/SO values, providing evidence that the gas-phase C/O ratio commonly exceeds 1 in their outer regions. We present new NOEMA observations of CS $5-4$, SO $7_6-6_5$ and $5_6-4_5$, C$_2$H $N=3-2$, HCN $3-2$, HCO$^+$ $3-2$, and H$^{13}$CO$^+$ $3-2$ in the DR Tau protoplanetary…
▽ More
Millimeter wavelength observations of Class II protoplanetary disks often display strong emission from hydrocarbons and high CS/SO values, providing evidence that the gas-phase C/O ratio commonly exceeds 1 in their outer regions. We present new NOEMA observations of CS $5-4$, SO $7_6-6_5$ and $5_6-4_5$, C$_2$H $N=3-2$, HCN $3-2$, HCO$^+$ $3-2$, and H$^{13}$CO$^+$ $3-2$ in the DR Tau protoplanetary disk at a resolution of $\sim0.4''$ (80 au). Estimates for the disk-averaged CS/SO ratio range from $\sim0.4-0.5$, the lowest value reported thus far for a T Tauri disk. At a projected separation of $\sim180$ au northeast of the star, the SO moment maps exhibit a clump that has no counterpart in the other lines, and the CS/SO value decreases to $<0.2$ at its location. Thermochemical models calculated with DALI indicate that DR Tau's low CS/SO ratio and faint C$_2$H emission can be explained by a gas-phase C/O ratio that is $<1$ at the disk radii traced by NOEMA. Comparisons of DR Tau's SO emission to maps of extended structures traced by $^{13}$CO suggest that late infall may contribute to driving down the gas-phase C/O ratio of its disk.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
A principled framework to assess the information-theoretic fitness of brain functional sub-circuits
Authors:
Duy Duong-Tran,
Nghi Nguyen,
Shizhuo Mu,
Jiong Chen,
Jingxuan Bao,
Frederick Xu,
Sumita Garai,
Jose Cadena-Pico,
Alan David Kaplan,
Tianlong Chen,
Yize Zhao,
Li Shen,
Joaquín Goñi
Abstract:
In systems and network neuroscience, many common practices in brain connectomic analysis are often not properly scrutinized. One such practice is mapping a predetermined set of sub-circuits, like functional networks (FNs), onto subjects' functional connectomes (FCs) without adequately assessing the information-theoretic appropriateness of the partition. Another practice that goes unchallenged is t…
▽ More
In systems and network neuroscience, many common practices in brain connectomic analysis are often not properly scrutinized. One such practice is mapping a predetermined set of sub-circuits, like functional networks (FNs), onto subjects' functional connectomes (FCs) without adequately assessing the information-theoretic appropriateness of the partition. Another practice that goes unchallenged is thresholding weighted FCs to remove spurious connections without justifying the chosen threshold. This paper leverages recent theoretical advances in Stochastic Block Models (SBMs) to formally define and quantify the information-theoretic fitness (e.g., prominence) of a predetermined set of FNs when mapped to individual FCs under different fMRI task conditions. Our framework allows for evaluating any combination of FC granularity, FN partition, and thresholding strategy, thereby optimizing these choices to preserve important topological features of the human brain connectomes. By applying to the Human Connectome Project with Schaefer parcellations at multiple levels of granularity, the framework showed that the common thresholding value of 0.25 was indeed information-theoretically valid for group-average FCs despite its previous lack of justification. Our results pave the way for the proper use of FNs and thresholding methods and provide insights for future research in individualized parcellations.
△ Less
Submitted 23 July, 2024; v1 submitted 26 June, 2024;
originally announced June 2024.
-
Improved bounds on quantum uncommon information
Authors:
Yonghae Lee,
Joonwoo Bae,
Hayata Yamasaki,
Soojoon Lee
Abstract:
In classical information theory, channel capacity quantifies the maximum number of messages that can be reliably transmitted using shared information. An equivalent concept, termed uncommon information, represents the number of messages required to be exchanged to completely share all information in common. However, this equivalence does not extend to quantum information theory. Specifically, quan…
▽ More
In classical information theory, channel capacity quantifies the maximum number of messages that can be reliably transmitted using shared information. An equivalent concept, termed uncommon information, represents the number of messages required to be exchanged to completely share all information in common. However, this equivalence does not extend to quantum information theory. Specifically, quantum uncommon information is operationally defined as the minimal amount of entanglement required for the quantum communication task of quantum state exchange, where two parties exchange quantum states to share all quantum messages in common. Currently, an analytical closed-form expression for the quantum uncommon information remains undetermined. In this work, by investigating underlying characterization of the quantum uncommon information, we derive improved bounds on it. To obtain these bounds, we develop a subspace exchange strategy that leverages a common subspace of two parties to identify the unnecessary qubits for exchange. We also consider a referee-assisted exchange, wherein a referee aids two parties in efficiently performing the quantum state exchange. Our bounds provide more precise estimations for the quantum uncommon information. Furthermore, we demonstrate that the subspace technique is a versatile tool for characterizing uncommon information not only in the bipartite scenario but also in various multi-partite ones.
△ Less
Submitted 24 July, 2024; v1 submitted 21 June, 2024;
originally announced June 2024.
-
Scalable Training of Trustworthy and Energy-Efficient Predictive Graph Foundation Models for Atomistic Materials Modeling: A Case Study with HydraGNN
Authors:
Massimiliano Lupo Pasini,
Jong Youl Choi,
Kshitij Mehta,
Pei Zhang,
David Rogers,
Jonghyun Bae,
Khaled Z. Ibrahim,
Ashwin M. Aji,
Karl W. Schulz,
Jorda Polo,
Prasanna Balaprakash
Abstract:
We present our work on developing and training scalable, trustworthy, and energy-efficient predictive graph foundation models (GFMs) using HydraGNN, a multi-headed graph convolutional neural network architecture. HydraGNN expands the boundaries of graph neural network (GNN) computations in both training scale and data diversity. It abstracts over message passing algorithms, allowing both reproduct…
▽ More
We present our work on developing and training scalable, trustworthy, and energy-efficient predictive graph foundation models (GFMs) using HydraGNN, a multi-headed graph convolutional neural network architecture. HydraGNN expands the boundaries of graph neural network (GNN) computations in both training scale and data diversity. It abstracts over message passing algorithms, allowing both reproduction of and comparison across algorithmic innovations that define nearest-neighbor convolution in GNNs. This work discusses a series of optimizations that have allowed scaling up the GFMs training to tens of thousands of GPUs on datasets that consist of hundreds of millions of graphs. Using over 154 million atomistic structures for training, we illustrate the performance of our approach along with the lessons learned on two state-of-the-art United States Department of Energy (US-DOE) supercomputers, namely the Perlmutter petascale system at the National Energy Research Scientific Computing Center and the Frontier exascale system at Oak Ridge Leadership Computing Facility. The HydraGNN architecture enables the GFM to achieve near-linear strong scaling performance using more than 2,000 GPUs on Perlmutter and 16,000 GPUs on Frontier. Hyperparameter optimization (HPO) was performed on over 64,000 Graphic Compute Dies (GCDs) on Frontier to select GFM architectures with high accuracy. Each HPO trial was ranked based on both accuracy and energy consumption. The training of an ensemble of highest-ranked GFM architectures (selected with judicious balance between accuracy and energy consumption) continued until convergence to establish uncertainty quantification (UQ) capabilities with ensemble learning. Our contributions establish core capabilities for rapidly developing, training, and deploying further GFMs using large-scale computational resources to enable AI-accelerated materials discovery and design.
△ Less
Submitted 16 October, 2024; v1 submitted 12 June, 2024;
originally announced June 2024.
-
FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation
Authors:
Xinzhi Mu,
Li Chen,
Bohan Chen,
Shuyang Gu,
Jianmin Bao,
Dong Chen,
Ji Li,
Yuhui Yuan
Abstract:
Recently, the application of modern diffusion-based text-to-image generation models for creating artistic fonts, traditionally the domain of professional designers, has garnered significant interest. Diverging from the majority of existing studies that concentrate on generating artistic typography, our research aims to tackle a novel and more demanding challenge: the generation of text effects for…
▽ More
Recently, the application of modern diffusion-based text-to-image generation models for creating artistic fonts, traditionally the domain of professional designers, has garnered significant interest. Diverging from the majority of existing studies that concentrate on generating artistic typography, our research aims to tackle a novel and more demanding challenge: the generation of text effects for multilingual fonts. This task essentially requires generating coherent and consistent visual content within the confines of a font-shaped canvas, as opposed to a traditional rectangular canvas. To address this task, we introduce a novel shape-adaptive diffusion model capable of interpreting the given shape and strategically planning pixel distributions within the irregular canvas. To achieve this, we curate a high-quality shape-adaptive image-text dataset and incorporate the segmentation mask as a visual condition to steer the image generation process within the irregular-canvas. This approach enables the traditionally rectangle canvas-based diffusion model to produce the desired concepts in accordance with the provided geometric shapes. Second, to maintain consistency across multiple letters, we also present a training-free, shape-adaptive effect transfer method for transferring textures from a generated reference letter to others. The key insights are building a font effect noise prior and propagating the font effect information in a concatenated latent space. The efficacy of our FontStudio system is confirmed through user preference studies, which show a marked preference (78% win-rates on aesthetics) for our system even when compared to the latest unrivaled commercial product, Adobe Firefly.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Sparse Multi-baseline SAR Cross-modal 3D Reconstruction of Vehicle Targets
Authors:
Da Li,
Guoqiang Zhao,
Houjun Sun,
Jiacheng Bao
Abstract:
Multi-baseline SAR 3D imaging faces significant challenges due to data sparsity. In recent years, deep learning techniques have achieved notable success in enhancing the quality of sparse SAR 3D imaging. However, previous work typically rely on full-aperture high-resolution radar images to supervise the training of deep neural networks (DNNs), utilizing only single-modal information from radar dat…
▽ More
Multi-baseline SAR 3D imaging faces significant challenges due to data sparsity. In recent years, deep learning techniques have achieved notable success in enhancing the quality of sparse SAR 3D imaging. However, previous work typically rely on full-aperture high-resolution radar images to supervise the training of deep neural networks (DNNs), utilizing only single-modal information from radar data. Consequently, imaging performance is limited, and acquiring full-aperture data for multi-baseline SAR is costly and sometimes impractical in real-world applications. In this paper, we propose a Cross-Modal Reconstruction Network (CMR-Net), which integrates differentiable render and cross-modal supervision with optical images to reconstruct highly sparse multi-baseline SAR 3D images of vehicle targets into visually structured and high-resolution images. We meticulously designed the network architecture and training strategies to enhance network generalization capability. Remarkably, CMR-Net, trained solely on simulated data, demonstrates high-resolution reconstruction capabilities on both publicly available simulation datasets and real measured datasets, outperforming traditional sparse reconstruction algorithms based on compressed sensing and other learning-based methods. Additionally, using optical images as supervision provides a cost-effective way to build training datasets, reducing the difficulty of method dissemination. Our work showcases the broad prospects of deep learning in multi-baseline SAR 3D imaging and offers a novel path for researching radar imaging based on cross-modal learning theory.
△ Less
Submitted 8 August, 2024; v1 submitted 6 June, 2024;
originally announced June 2024.
-
Language Model Can Do Knowledge Tracing: Simple but Effective Method to Integrate Language Model and Knowledge Tracing Task
Authors:
Unggi Lee,
Jiyeong Bae,
Dohee Kim,
Sookbun Lee,
Jaekwon Park,
Taekyung Ahn,
Gunho Lee,
Damji Stratton,
Hyeoncheol Kim
Abstract:
Knowledge Tracing (KT) is a critical task in online learning for modeling student knowledge over time. Despite the success of deep learning-based KT models, which rely on sequences of numbers as data, most existing approaches fail to leverage the rich semantic information in the text of questions and concepts. This paper proposes Language model-based Knowledge Tracing (LKT), a novel framework that…
▽ More
Knowledge Tracing (KT) is a critical task in online learning for modeling student knowledge over time. Despite the success of deep learning-based KT models, which rely on sequences of numbers as data, most existing approaches fail to leverage the rich semantic information in the text of questions and concepts. This paper proposes Language model-based Knowledge Tracing (LKT), a novel framework that integrates pre-trained language models (PLMs) with KT methods. By leveraging the power of language models to capture semantic representations, LKT effectively incorporates textual information and significantly outperforms previous KT models on large benchmark datasets. Moreover, we demonstrate that LKT can effectively address the cold-start problem in KT by leveraging the semantic knowledge captured by PLMs. Interpretability of LKT is enhanced compared to traditional KT models due to its use of text-rich data. We conducted the local interpretable model-agnostic explanation technique and analysis of attention scores to interpret the model performance further. Our work highlights the potential of integrating PLMs with KT and paves the way for future research in KT domain.
△ Less
Submitted 9 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
Generative Pre-Trained Diffusion Paradigm for Zero-Shot Time Series Forecasting
Authors:
Jiarui Yang,
Tao Dai,
Naiqi Li,
Junxi Wu,
Peiyuan Liu,
Jinmin Li,
Jigang Bao,
Haigang Zhang,
Shutao Xia
Abstract:
In recent years, generative pre-trained paradigms such as Large Language Models (LLMs) and Large Vision Models (LVMs) have achieved revolutionary advancements and widespread real-world applications. Particularly, the emergence of pre-trained LLMs-based temporal works, compared to previous deep model approaches, has demonstrated superior generalization and robustness, showcasing the potential of ge…
▽ More
In recent years, generative pre-trained paradigms such as Large Language Models (LLMs) and Large Vision Models (LVMs) have achieved revolutionary advancements and widespread real-world applications. Particularly, the emergence of pre-trained LLMs-based temporal works, compared to previous deep model approaches, has demonstrated superior generalization and robustness, showcasing the potential of generative pre-trained paradigms as foundation models for time series. However, those LLMs-based works mainly focus on cross-modal research, i.e., leveraging the language capabilities of LLMs in time series contexts. Although they have achieved impressive performance, there still exist the issues of concept drift caused by differences in data distribution and inflexibility caused by misalignment of dimensions. To this end, inspired by recent work on LVMs, we reconsider the paradigm of time series modeling. In this paper, we comprehensively explore, for the first time, the effectiveness and superiority of the Generative Pre-trained Diffusion (GPD) paradigm in real-world multivariate time series forecasting (TSF). Specifically, to mitigate performance bias introduced by sophisticated networks, we propose a straightforward MLP diffusion network for unconditional modeling of time series. Then we employ a zero-shot and tuning-free method to predict (generate) future data using historical data as prompts. The GPD paradigm is established on the time series modality, effectively preventing the phenomenon of concept drift, and enabling flexible forecasting of arbitrary lengths. We demonstrate that the GPD paradigm achieves comprehensive performance and generalization comparable to current SOTA LLM-based and deep model paradigms on mainstream benchmarks and various TSF tasks. Extensive experiments validate the potential of the GPD paradigm and its assistance in future related research.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Representation and De-interleaving of Mixtures of Hidden Markov Processes
Authors:
Jiadi Bao,
Mengtao Zhu,
Yunjie Li,
Shafei Wang
Abstract:
De-interleaving of the mixtures of Hidden Markov Processes (HMPs) generally depends on its representation model. Existing representation models consider Markov chain mixtures rather than hidden Markov, resulting in the lack of robustness to non-ideal situations such as observation noise or missing observations. Besides, de-interleaving methods utilize a search-based strategy, which is time-consumi…
▽ More
De-interleaving of the mixtures of Hidden Markov Processes (HMPs) generally depends on its representation model. Existing representation models consider Markov chain mixtures rather than hidden Markov, resulting in the lack of robustness to non-ideal situations such as observation noise or missing observations. Besides, de-interleaving methods utilize a search-based strategy, which is time-consuming. To address these issues, this paper proposes a novel representation model and corresponding de-interleaving methods for the mixtures of HMPs. At first, a generative model for representing the mixtures of HMPs is designed. Subsequently, the de-interleaving process is formulated as a posterior inference for the generative model. Secondly, an exact inference method is developed to maximize the likelihood of the complete data, and two approximate inference methods are developed to maximize the evidence lower bound by creating tractable structures. Then, a theoretical error probability lower bound is derived using the likelihood ratio test, and the algorithms are shown to get reasonably close to the bound. Finally, simulation results demonstrate that the proposed methods are highly effective and robust for non-ideal situations, outperforming baseline methods on simulated and real-life data.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
On the viscoelastic-electromagnetic-gravitational analogy
Authors:
Jose' M. Carcione,
Jing Ba
Abstract:
The analogy between electromagnetism and gravitation was achieved by linearizing the tensorial gravitational equations of general relativity and converting them into a vector form corresponding to Maxwell's electromagnetic equations. On this basis, we use the equivalence with viscoelasticity (SH waves) and propose a theory of gravitational waves. We add a damping term to the differential equations…
▽ More
The analogy between electromagnetism and gravitation was achieved by linearizing the tensorial gravitational equations of general relativity and converting them into a vector form corresponding to Maxwell's electromagnetic equations. On this basis, we use the equivalence with viscoelasticity (SH waves) and propose a theory of gravitational waves. We add a damping term to the differential equations, which is equivalent to Ohm's law in electromagnetism and Maxwell's viscosity in viscoelasticity, to describe the attenuation of the waves. A plane-wave analysis gives the phase velocity, the energy velocity, the quality factor and the attenuation factor of the field as well as the energy balance. To obtain these properties, we use the analogy with viscoelasticity; the properties of electromagnetic and gravitational waves are similar to those of shear waves. The presence of attenuation means that the transient field is generally a composition of inhomogeneous (non-uniform) plane waves, where the propagation and attenuation vectors do not point in the same direction and the phase velocity vector and the energy flux (energy velocity) are not collinear. The polarization of cross-plane field is linear and perpendicular to the propagation-attenuation plane, while the polarization of the field within the plane is elliptical. Transient wave fields in the space-time domain are analyzed with the Green function (in homogeneous media) and with a grid method (in heterogeneous media) based on the Fourier method for calculating the spatial derivatives and a Runge-Kutta scheme of order 4 for the time stepping. In the examples, wave propagation at the Sun-Earth and Earth-Moon distances using quadrupole sources is considered in comparison to viscoelastic waves. Finally, an example of propagation in heterogeneous media is presented.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
To FP8 and Back Again: Quantifying the Effects of Reducing Precision on LLM Training Stability
Authors:
Joonhyung Lee,
Jeongin Bae,
Byeongwook Kim,
Se Jung Kwon,
Dongsoo Lee
Abstract:
The massive computational costs associated with large language model (LLM) pretraining have spurred great interest in reduced-precision floating-point representations to accelerate the process. As a result, the BrainFloat16 (BF16) precision has become the de facto standard for LLM training, with hardware support included in recent accelerators. This trend has gone even further in the latest proces…
▽ More
The massive computational costs associated with large language model (LLM) pretraining have spurred great interest in reduced-precision floating-point representations to accelerate the process. As a result, the BrainFloat16 (BF16) precision has become the de facto standard for LLM training, with hardware support included in recent accelerators. This trend has gone even further in the latest processors, where FP8 has recently been introduced. However, prior experience with FP16, which was found to be less stable than BF16, raises concerns as to whether FP8, with even fewer bits than FP16, can be a cost-effective option for LLM training. We argue that reduced-precision training schemes must have similar training stability and hyperparameter sensitivities to their higher-precision counterparts in order to be cost-effective. However, we find that currently available methods for FP8 training are not robust enough to allow their use as economical replacements. This prompts us to investigate the stability of reduced-precision LLM training in terms of robustness across random seeds and learning rates. To this end, we propose new evaluation techniques and a new metric for quantifying loss landscape sharpness in autoregressive language models. By simulating incremental bit reductions in floating-point representations, we analyze the relationship between representational power and training stability with the intent of aiding future research into the field.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions
Authors:
Sang Keun Choe,
Hwijeen Ahn,
Juhan Bae,
Kewen Zhao,
Minsoo Kang,
Youngseog Chung,
Adithya Pratapa,
Willie Neiswanger,
Emma Strubell,
Teruko Mitamura,
Jeff Schneider,
Eduard Hovy,
Roger Grosse,
Eric Xing
Abstract:
Large language models (LLMs) are trained on a vast amount of human-written data, but data providers often remain uncredited. In response to this issue, data valuation (or data attribution), which quantifies the contribution or value of each data to the model output, has been discussed as a potential solution. Nevertheless, applying existing data valuation methods to recent LLMs and their vast trai…
▽ More
Large language models (LLMs) are trained on a vast amount of human-written data, but data providers often remain uncredited. In response to this issue, data valuation (or data attribution), which quantifies the contribution or value of each data to the model output, has been discussed as a potential solution. Nevertheless, applying existing data valuation methods to recent LLMs and their vast training datasets has been largely limited by prohibitive compute and memory costs. In this work, we focus on influence functions, a popular gradient-based data valuation method, and significantly improve its scalability with an efficient gradient projection strategy called LoGra that leverages the gradient structure in backpropagation. We then provide a theoretical motivation of gradient projection approaches to influence functions to promote trust in the data valuation process. Lastly, we lower the barrier to implementing data valuation systems by introducing LogIX, a software package that can transform existing training code into data valuation code with minimal effort. In our data valuation experiments, LoGra achieves competitive accuracy against more expensive baselines while showing up to 6,500x improvement in throughput and 5x reduction in GPU memory usage when applied to Llama3-8B-Instruct and the 1B-token dataset.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.