-
Metal Price Spike Prediction via a Neurosymbolic Ensemble Approach
Authors:
Nathaniel Lee,
Noel Ngu,
Harshdeep Singh Sahdev,
Pramod Motaganahall,
Al Mehdi Saadat Chowdhury,
Bowen Xi,
Paulo Shakarian
Abstract:
Predicting price spikes in critical metals such as Cobalt, Copper, Magnesium, and Nickel is crucial for mitigating economic risks associated with global trends like the energy transition and reshoring of manufacturing. While traditional models have focused on regression-based approaches, our work introduces a neurosymbolic ensemble framework that integrates multiple neural models with symbolic err…
▽ More
Predicting price spikes in critical metals such as Cobalt, Copper, Magnesium, and Nickel is crucial for mitigating economic risks associated with global trends like the energy transition and reshoring of manufacturing. While traditional models have focused on regression-based approaches, our work introduces a neurosymbolic ensemble framework that integrates multiple neural models with symbolic error detection and correction rules. This framework is designed to enhance predictive accuracy by correcting individual model errors and offering interpretability through rule-based explanations. We show that our method provides up to 6.42% improvement in precision, 29.41% increase in recall at 13.24% increase in F1 over the best performing neural models. Further, our method, as it is based on logical rules, has the benefit of affording an explanation as to which combination of neural models directly contribute to a given prediction.
△ Less
Submitted 16 October, 2024;
originally announced October 2024.
-
Incommensurate Transverse Peierls Transition
Authors:
F. Z. Yang,
K. F. Luo,
Weizhe Zhang,
Xiaoyu Guo,
W. R. Meier,
H. Ni,
H. X. Li,
P. Mercado Lozano,
G. Fabbris,
A. H. Said,
C. Nelson,
T. T. Zhang,
A. F. May,
M. A. McGuire,
R. Juneja,
L. Lindsay,
H. N. Lee,
J. -M. Zuo,
M. F. Chi,
X. Dai,
Liuyan Zhao,
H. Miao
Abstract:
In one-dimensional quantum materials, conducting electrons and the underlying lattices can undergo a spontaneous translational symmetry breaking, known as Peierls transition. For nearly a century, the Peierls transition has been understood within the paradigm of electron-electron interactions mediated by longitudinal acoustic phonons. This classical picture has recently been revised in topological…
▽ More
In one-dimensional quantum materials, conducting electrons and the underlying lattices can undergo a spontaneous translational symmetry breaking, known as Peierls transition. For nearly a century, the Peierls transition has been understood within the paradigm of electron-electron interactions mediated by longitudinal acoustic phonons. This classical picture has recently been revised in topological semimetals, where transverse acoustic phonons can couple with conducting p-orbital electrons and give rise to an unconventional Fermi surface instability, dubbed the transverse Peierls transition (TPT). Most interestingly, the TPT induced lattice distortions can further break rotation or mirror/inversion symmetries, leading to nematic or chiral charge density waves (CDWs). Quantum materials that host the TPT, however, have not been experimentally established. Here, we report the experimental discovery of an incommensurate TPT in the tetragonal Dirac semimetal EuAl$_4$. Using inelastic x-ray scattering with meV resolution, we observe the complete softening of a transverse acoustic phonon at the CDW wavevector upon cooling, whereas the longitudinal acoustic phonon is nearly unchanged. Combining with first principles calculations, we show that the incommensurate CDW wavevector matches the calculated charge susceptibility peak and connects the nested Dirac bands with Al 3$p_{x}$ and 3$p_{y}$ orbitals. Supplemented by second harmonic generation measurements, we show that the CDW induced lattice distortions break all vertical and diagonal mirrors whereas the four-fold rotational symmetry is retained below the CDW transition. Our observations strongly suggest a chiral CDW in EuAl$_4$ and highlight the TPT as a new avenue for chiral quantum states.
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
-
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Authors:
Cheol Jun Cho,
Nicholas Lee,
Akshat Gupta,
Dhruv Agarwal,
Ethan Chen,
Alan W Black,
Gopala K. Anumanchipalli
Abstract:
Syllables are compositional units of spoken language that play a crucial role in human speech perception and production. However, current neural speech representations lack structure, resulting in dense token sequences that are costly to process. To bridge this gap, we propose a new model, Sylber, that produces speech representations with clean and robust syllabic structure. Specifically, we propo…
▽ More
Syllables are compositional units of spoken language that play a crucial role in human speech perception and production. However, current neural speech representations lack structure, resulting in dense token sequences that are costly to process. To bridge this gap, we propose a new model, Sylber, that produces speech representations with clean and robust syllabic structure. Specifically, we propose a self-supervised model that regresses features on syllabic segments distilled from a teacher model which is an exponential moving average of the model in training. This results in a highly structured representation of speech features, offering three key benefits: 1) a fast, linear-time syllable segmentation algorithm, 2) efficient syllabic tokenization with an average of 4.27 tokens per second, and 3) syllabic units better suited for lexical and syntactic understanding. We also train token-to-speech generative models with our syllabic units and show that fully intelligible speech can be reconstructed from these tokens. Lastly, we observe that categorical perception, a linguistic phenomenon of speech perception, emerges naturally in our model, making the embedding space more categorical and sparse than previous self-supervised learning approaches. Together, we present a novel self-supervised approach for representing speech as syllables, with significant potential for efficient speech tokenization and spoken language modeling.
△ Less
Submitted 9 October, 2024;
originally announced October 2024.
-
EVOLvE: Evaluating and Optimizing LLMs For Exploration
Authors:
Allen Nie,
Yi Su,
Bo Chang,
Jonathan N. Lee,
Ed H. Chi,
Quoc V. Le,
Minmin Chen
Abstract:
Despite their success in many domains, large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty. This is crucial as many real-world applications, ranging from personalized recommendations to healthcare interventions, demand that LLMs not only predict but also actively learn to make optimal decisions through exploration. In this work, we mea…
▽ More
Despite their success in many domains, large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty. This is crucial as many real-world applications, ranging from personalized recommendations to healthcare interventions, demand that LLMs not only predict but also actively learn to make optimal decisions through exploration. In this work, we measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications. We develop a comprehensive suite of environments, including both context-free and contextual bandits with varying task difficulties, to benchmark LLMs' performance. Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs: by providing explicit algorithm-guided support during inference; and through algorithm distillation via in-context demonstrations and fine-tuning, using synthetic data generated from these algorithms. Impressively, these techniques allow us to achieve superior exploration performance with smaller models, surpassing larger models on various tasks. We conducted an extensive ablation study to shed light on various factors, such as task difficulty and data representation, that influence the efficiency of LLM exploration. Additionally, we conduct a rigorous analysis of the LLM's exploration efficiency using the concept of regret, linking its ability to explore to the model size and underlying algorithm.
△ Less
Submitted 8 October, 2024;
originally announced October 2024.
-
Enabling Clinical Use of Linear Energy Transfer in Proton Therapy for Head and Neck Cancer -- A Review of Implications for Treatment Planning and Adverse Events Study
Authors:
Jingyuan Chen,
Yunze Yang,
Hongying Feng,
Chenbin Liu,
Lian Zhang,
Jason M. Holmes,
Zhengliang Liu,
Haibo Lin,
Tianming Liu,
Charles B. Simone II,
Nancy Y. Lee,
Steven E. Frank,
Daniel J. Ma,
Samir H. Patel,
Wei Liu
Abstract:
Proton therapy offers significant advantages due to its unique physical and biological properties, particularly the Bragg peak, enabling precise dose delivery to tumors while sparing healthy tissues. However, the clinical implementation is challenged by the oversimplification of the relative biological effectiveness (RBE) as a fixed value of 1.1, which does not account for the complex interplay be…
▽ More
Proton therapy offers significant advantages due to its unique physical and biological properties, particularly the Bragg peak, enabling precise dose delivery to tumors while sparing healthy tissues. However, the clinical implementation is challenged by the oversimplification of the relative biological effectiveness (RBE) as a fixed value of 1.1, which does not account for the complex interplay between dose, linear energy transfer (LET), and biological endpoints. Lack of heterogeneity control or the understanding of the complex interplay may result in unexpected adverse events and suboptimal patient outcomes. On the other hand, expanding our knowledge of variable tumor RBE and LET optimization may provide a better management strategy for radioresistant tumors. This review examines recent advancements in LET calculation methods, including analytical models and Monte Carlo simulations. The integration of LET into plan evaluation is assessed to enhance plan quality control. LET-guided robust optimization demonstrates promise in minimizing high-LET exposure to organs at risk, thereby reducing the risk of adverse events. Dosimetric seed spot analysis is discussed to show its importance in revealing the true LET-related effect upon the adverse event initialization by finding the lesion origins and eliminating the confounding factors from the biological processes. Dose-LET volume histograms (DLVH) are discussed as effective tools for correlating physical dose and LET with clinical outcomes, enabling the derivation of clinically relevant dose-LET volume constraints without reliance on uncertain RBE models. Based on DLVH, the dose-LET volume constraints (DLVC)-guided robust optimization is introduced to upgrade conventional dose-volume constraints-based robust optimization, which optimizes the joint distribution of dose and LET simultaneously.
△ Less
Submitted 6 October, 2024;
originally announced October 2024.
-
Master integrals for $e^{+}e^{-}\rightarrow2γ$ process at large energies and angles
Authors:
Roman N. Lee,
Vyacheslav A. Stotsky
Abstract:
We calculate two-loop massive master integrals for $e^{+}e^{-}\rightarrow2γ$ in terms of generalized power series with respect to electron mass. The coefficients of this series are expressed via Goncharov's polylogarithms. Our approach exploits a number of modern multiloop methods: IBP reduction, differential equations for master integrals, Frobenius method, reduction to $ε$-form, and DRA method.
We calculate two-loop massive master integrals for $e^{+}e^{-}\rightarrow2γ$ in terms of generalized power series with respect to electron mass. The coefficients of this series are expressed via Goncharov's polylogarithms. Our approach exploits a number of modern multiloop methods: IBP reduction, differential equations for master integrals, Frobenius method, reduction to $ε$-form, and DRA method.
△ Less
Submitted 4 October, 2024;
originally announced October 2024.
-
Protein-Mamba: Biological Mamba Models for Protein Function Prediction
Authors:
Bohao Xu,
Yingzhou Lu,
Yoshitaka Inoue,
Namkyeong Lee,
Tianfan Fu,
Jintai Chen
Abstract:
Protein function prediction is a pivotal task in drug discovery, significantly impacting the development of effective and safe therapeutics. Traditional machine learning models often struggle with the complexity and variability inherent in predicting protein functions, necessitating more sophisticated approaches. In this work, we introduce Protein-Mamba, a novel two-stage model that leverages both…
▽ More
Protein function prediction is a pivotal task in drug discovery, significantly impacting the development of effective and safe therapeutics. Traditional machine learning models often struggle with the complexity and variability inherent in predicting protein functions, necessitating more sophisticated approaches. In this work, we introduce Protein-Mamba, a novel two-stage model that leverages both self-supervised learning and fine-tuning to improve protein function prediction. The pre-training stage allows the model to capture general chemical structures and relationships from large, unlabeled datasets, while the fine-tuning stage refines these insights using specific labeled datasets, resulting in superior prediction performance. Our extensive experiments demonstrate that Protein-Mamba achieves competitive performance, compared with a couple of state-of-the-art methods across a range of protein function datasets. This model's ability to effectively utilize both unlabeled and labeled data highlights the potential of self-supervised learning in advancing protein function prediction and offers a promising direction for future research in drug discovery.
△ Less
Submitted 22 September, 2024;
originally announced September 2024.
-
NVLM: Open Frontier-Class Multimodal LLMs
Authors:
Wenliang Dai,
Nayeon Lee,
Boxin Wang,
Zhuoling Yang,
Zihan Liu,
Jon Barker,
Tuomas Rintamaki,
Mohammad Shoeybi,
Bryan Catanzaro,
Wei Ping
Abstract:
We introduce NVLM 1.0, a family of frontier-class multimodal large language models (LLMs) that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary models (e.g., GPT-4o) and open-access models (e.g., Llama 3-V 405B and InternVL 2). Remarkably, NVLM 1.0 shows improved text-only performance over its LLM backbone after multimodal training. In terms of model desi…
▽ More
We introduce NVLM 1.0, a family of frontier-class multimodal large language models (LLMs) that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary models (e.g., GPT-4o) and open-access models (e.g., Llama 3-V 405B and InternVL 2). Remarkably, NVLM 1.0 shows improved text-only performance over its LLM backbone after multimodal training. In terms of model design, we perform a comprehensive comparison between decoder-only multimodal LLMs (e.g., LLaVA) and cross-attention-based models (e.g., Flamingo). Based on the strengths and weaknesses of both approaches, we propose a novel architecture that enhances both training efficiency and multimodal reasoning capabilities. Furthermore, we introduce a 1-D tile-tagging design for tile-based dynamic high-resolution images, which significantly boosts performance on multimodal reasoning and OCR-related tasks. Regarding training data, we meticulously curate and provide detailed information on our multimodal pretraining and supervised fine-tuning datasets. Our findings indicate that dataset quality and task diversity are more important than scale, even during the pretraining phase, across all architectures. Notably, we develop production-grade multimodality for the NVLM-1.0 models, enabling them to excel in vision-language tasks while maintaining and even improving text-only performance compared to their LLM backbones. To achieve this, we craft and integrate a high-quality text-only dataset into multimodal training, alongside a substantial amount of multimodal math and reasoning data, leading to enhanced math and coding capabilities across modalities. To advance research in the field, we are releasing the model weights and will open-source the code for the community: https://nvlm-project.github.io/.
△ Less
Submitted 17 September, 2024;
originally announced September 2024.
-
Sherpa: An Open Source Python Fitting Package
Authors:
Aneta Siemiginowska,
Douglas Burke,
Hans Moritz Günther,
Nicholas P. Lee,
Warren McLaughlin,
David A. Principe,
Harlan Cheer,
Antonella Fruscione,
Omar Laurino,
Jonathan McDowell,
Marie Terrell
Abstract:
We present an overview of Sherpa, an open source Python project, and discuss its development history, broad design concepts and capabilities. Sherpa contains powerful tools for combining parametric models into complex expressions that can be fit to data using a variety of statistics and optimization methods. It is easily extensible to include user-defined models, statistics, and optimization metho…
▽ More
We present an overview of Sherpa, an open source Python project, and discuss its development history, broad design concepts and capabilities. Sherpa contains powerful tools for combining parametric models into complex expressions that can be fit to data using a variety of statistics and optimization methods. It is easily extensible to include user-defined models, statistics, and optimization methods. It provides a high-level User Interface for interactive data-analysis, such as within a Jupyter notebook, and it can also be used as a library component, providing fitting and modeling capabilities to an application. We include a few examples of Sherpa applications to multiwavelength astronomical data. The code is available GitHub: https://github.com/sherpa/sherpa
△ Less
Submitted 16 September, 2024;
originally announced September 2024.
-
Self-supervised Learning for Acoustic Few-Shot Classification
Authors:
Jingyong Liang,
Bernd Meyer,
Issac Ning Lee,
Thanh-Toan Do
Abstract:
Labelled data are limited and self-supervised learning is one of the most important approaches for reducing labelling requirements. While it has been extensively explored in the image domain, it has so far not received the same amount of attention in the acoustic domain. Yet, reducing labelling is a key requirement for many acoustic applications. Specifically in bioacoustic, there are rarely suffi…
▽ More
Labelled data are limited and self-supervised learning is one of the most important approaches for reducing labelling requirements. While it has been extensively explored in the image domain, it has so far not received the same amount of attention in the acoustic domain. Yet, reducing labelling is a key requirement for many acoustic applications. Specifically in bioacoustic, there are rarely sufficient labels for fully supervised learning available. This has led to the widespread use of acoustic recognisers that have been pre-trained on unrelated data for bioacoustic tasks. We posit that training on the actual task data and combining self-supervised pre-training with few-shot classification is a superior approach that has the ability to deliver high accuracy even when only a few labels are available. To this end, we introduce and evaluate a new architecture that combines CNN-based preprocessing with feature extraction based on state space models (SSMs). This combination is motivated by the fact that CNN-based networks alone struggle to capture temporal information effectively, which is crucial for classifying acoustic signals. SSMs, specifically S4 and Mamba, on the other hand, have been shown to have an excellent ability to capture long-range dependencies in sequence data. We pre-train this architecture using contrastive learning on the actual task data and subsequent fine-tuning with an extremely small amount of labelled data. We evaluate the performance of this proposed architecture for ($n$-shot, $n$-class) classification on standard benchmarks as well as real-world data. Our evaluation shows that it outperforms state-of-the-art architectures on the few-shot classification problem.
△ Less
Submitted 15 September, 2024;
originally announced September 2024.
-
Low-Earth Orbit Satellite Network Analysis: Coverage under Distance-Dependent Shadowing
Authors:
Jinseok Choi,
Jeonghun Park,
Junse Lee,
Namyoon Lee
Abstract:
This paper offers a thorough analysis of the coverage performance of Low Earth Orbit (LEO) satellite networks using a strongest satellite association approach, with a particular emphasis on shadowing effects modeled through a Poisson point process (PPP)-based network framework. We derive an analytical expression for the coverage probability, which incorporates key system parameters and a distance-…
▽ More
This paper offers a thorough analysis of the coverage performance of Low Earth Orbit (LEO) satellite networks using a strongest satellite association approach, with a particular emphasis on shadowing effects modeled through a Poisson point process (PPP)-based network framework. We derive an analytical expression for the coverage probability, which incorporates key system parameters and a distance-dependent shadowing probability function, explicitly accounting for both line-of-sight and non-line-of-sight propagation channels. To enhance the practical relevance of our findings, we provide both lower and upper bounds for the coverage probability and introduce a closed-form solution based on a simplified shadowing model. Our analysis reveals several important network design insights, including the enhancement of coverage probability by distance-dependent shadowing effects and the identification of an optimal satellite altitude that balances beam gain benefits with interference drawbacks. Notably, our PPP-based network model shows strong alignment with other established models, confirming its accuracy and applicability across a variety of satellite network configurations. The insights gained from our analysis are valuable for optimizing LEO satellite deployment strategies and improving network performance in diverse scenarios.
△ Less
Submitted 5 September, 2024;
originally announced September 2024.
-
Defects and type D relativistic Toda lattice for some 5d gauge theories
Authors:
Kimyeong Lee,
Norton Lee
Abstract:
We perform folding on the ADHM construction of the instanton moduli space from $SU$ to $SO$ group. A Young diagram description for the $SO$ instanton is obtained after modifying the real and complex moment maps of the ADHM data. We study the Bethe gauge correspondence between type D relativistic Toda lattice and 5d $\mathcal{N}=1$ folded theory. In particular we prove that the regular monodromy de…
▽ More
We perform folding on the ADHM construction of the instanton moduli space from $SU$ to $SO$ group. A Young diagram description for the $SO$ instanton is obtained after modifying the real and complex moment maps of the ADHM data. We study the Bethe gauge correspondence between type D relativistic Toda lattice and 5d $\mathcal{N}=1$ folded theory. In particular we prove that the regular monodromy defect in the folded gauge theory is the stationary wavefunction of the type D relativistic Toda lattice.
△ Less
Submitted 5 September, 2024;
originally announced September 2024.
-
Software Verification with CPAchecker 3.0: Tutorial and User Guide (Extended Version)
Authors:
Daniel Baier,
Dirk Beyer,
Po-Chun Chien,
Marie-Christine Jakobs,
Marek Jankola,
Matthias Kettl,
Nian-Ze Lee,
Thomas Lemberger,
Marian Lingsch-Rosenfeld,
Henrik Wachowitz,
Philipp Wendler
Abstract:
This tutorial provides an introduction to CPAchecker for users. CPAchecker is a flexible and configurable framework for software verification and testing. The framework provides many abstract domains, such as BDDs, explicit values, intervals, memory graphs, and predicates, and many program-analysis and model-checking algorithms, such as abstract interpretation, bounded model checking, Impact, inte…
▽ More
This tutorial provides an introduction to CPAchecker for users. CPAchecker is a flexible and configurable framework for software verification and testing. The framework provides many abstract domains, such as BDDs, explicit values, intervals, memory graphs, and predicates, and many program-analysis and model-checking algorithms, such as abstract interpretation, bounded model checking, Impact, interpolation-based model checking, k -induction, PDR, predicate abstraction, and symbolic execution. This tutorial presents basic use cases for CPAchecker in formal software verification, focusing on its main verification techniques with their strengths and weaknesses. It also shows further use cases of CPAchecker for test-case generation and witness-based result validation. The envisioned readers are assumed to possess a background in automatic formal verification and program analysis, but prior knowledge of CPAchecker is not required. This tutorial and user guide is based on CPAchecker in version 3.0. This user guide's latest version and other documentation are available at https://cpachecker.sosy-lab.org/doc.php.
△ Less
Submitted 3 September, 2024;
originally announced September 2024.
-
TinyAgent: Function Calling at the Edge
Authors:
Lutfi Eren Erdogan,
Nicholas Lee,
Siddharth Jha,
Sehoon Kim,
Ryan Tabrizi,
Suhong Moon,
Coleman Hooper,
Gopala Anumanchipalli,
Kurt Keutzer,
Amir Gholami
Abstract:
Recent large language models (LLMs) have enabled the development of advanced agentic systems that can integrate various tools and APIs to fulfill user queries through function calling. However, the deployment of these LLMs on the edge has not been explored since they typically require cloud-based infrastructure due to their substantial model size and computational demands. To this end, we present…
▽ More
Recent large language models (LLMs) have enabled the development of advanced agentic systems that can integrate various tools and APIs to fulfill user queries through function calling. However, the deployment of these LLMs on the edge has not been explored since they typically require cloud-based infrastructure due to their substantial model size and computational demands. To this end, we present TinyAgent, an end-to-end framework for training and deploying task-specific small language model agents capable of function calling for driving agentic systems at the edge. We first show how to enable accurate function calling for open-source models via the LLMCompiler framework. We then systematically curate a high-quality dataset for function calling, which we use to fine-tune two small language models, TinyAgent-1.1B and 7B. For efficient inference, we introduce a novel tool retrieval method to reduce the input prompt length and utilize quantization to further accelerate the inference speed. As a driving application, we demonstrate a local Siri-like system for Apple's MacBook that can execute user commands through text or voice input. Our results show that our models can achieve, and even surpass, the function-calling capabilities of larger models like GPT-4-Turbo, while being fully deployed at the edge. We open-source our dataset, models, and installable package and provide a demo video for our MacBook assistant agent.
△ Less
Submitted 21 October, 2024; v1 submitted 1 September, 2024;
originally announced September 2024.
-
Isolation and characterization of atomically thin mica phyllosilicates
Authors:
Kristine L. Haley,
Noah F. Lee,
Vergil M. Schreiber,
Nicholas T. Pereira,
Randy M. Sterbentz,
Timothy Y. Chung,
Joshua O. Island
Abstract:
One of the roadblocks to employing two-dimensional (2D) materials in next generation devices is the lack of high quality insulators. Insulating layered materials with inert and atomically flat surfaces are ideal for high performance transistors and this has been exemplified with commonly used boron nitride. While the list of insulating 2D materials is limited, the earth-abundant phyllosilicates ar…
▽ More
One of the roadblocks to employing two-dimensional (2D) materials in next generation devices is the lack of high quality insulators. Insulating layered materials with inert and atomically flat surfaces are ideal for high performance transistors and this has been exemplified with commonly used boron nitride. While the list of insulating 2D materials is limited, the earth-abundant phyllosilicates are particularly attractive candidates. Here, we investigate the properties of atomically thin biotite and muscovite, the most common and commercially important micas from the rock-forming minerals. From a group of five natural bulk samples, energy dispersive X-ray spectroscopy is used to classify exfoliated flakes into three types of biotite, including the phlogopite endmember, and two muscovites. We provide a catalog of RGB contrast values for exfoliated flakes ranging from bilayer to approximately 175 nm. Additionally, we report the complex index of refraction for all investigated materials based on micro-reflectance measurements. Our findings suggest that earth-abundant phyllosilicates could serve as scalable insulators for logic devices employing 2D materials, potentially overcoming current limitations in the field.
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
Spectrum Sharing Between Low Earth Orbit Satellite and Terrestrial Networks: A Stochastic Geometry Perspective Analysis
Authors:
Daeun Kim,
Jeonghun Park,
Jinseok Choi,
Namyoon Lee
Abstract:
Low Earth orbit (LEO) satellite networks with mega constellations have the potential to provide 5G and beyond services ubiquitously. However, these networks may introduce mutual interference to both satellite and terrestrial networks, particularly when sharing spectrum resources. In this paper, we present a system-level performance analysis to address these interference issues using the tool of st…
▽ More
Low Earth orbit (LEO) satellite networks with mega constellations have the potential to provide 5G and beyond services ubiquitously. However, these networks may introduce mutual interference to both satellite and terrestrial networks, particularly when sharing spectrum resources. In this paper, we present a system-level performance analysis to address these interference issues using the tool of stochastic geometry. We model the spatial distributions of satellites, satellite users, terrestrial base stations (BSs), and terrestrial users using independent Poisson point processes on the surfaces of concentric spheres. Under these spatial models, we derive analytical expressions for the ergodic spectral efficiency of uplink (UL) and downlink (DL) satellite networks when they share spectrum with both UL and DL terrestrial networks. These derived ergodic expressions capture comprehensive network parameters, including the densities of satellite and terrestrial networks, the path-loss exponent, and fading. From our analysis, we determine the conditions under which spectrum sharing with UL terrestrial networks is advantageous for both UL and DL satellite networks. Our key finding is that the optimal spectrum sharing configuration among the four possible configurations depends on the density ratio between terrestrial BSs and users, providing a design guideline for spectrum management. Simulation results confirm the accuracy of our derived expressions.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Off-Policy Reinforcement Learning with High Dimensional Reward
Authors:
Dong Neuck Lee,
Michael R. Kosorok
Abstract:
Conventional off-policy reinforcement learning (RL) focuses on maximizing the expected return of scalar rewards. Distributional RL (DRL), in contrast, studies the distribution of returns with the distributional Bellman operator in a Euclidean space, leading to highly flexible choices for utility. This paper establishes robust theoretical foundations for DRL. We prove the contraction property of th…
▽ More
Conventional off-policy reinforcement learning (RL) focuses on maximizing the expected return of scalar rewards. Distributional RL (DRL), in contrast, studies the distribution of returns with the distributional Bellman operator in a Euclidean space, leading to highly flexible choices for utility. This paper establishes robust theoretical foundations for DRL. We prove the contraction property of the Bellman operator even when the reward space is an infinite-dimensional separable Banach space. Furthermore, we demonstrate that the behavior of high- or infinite-dimensional returns can be effectively approximated using a lower-dimensional Euclidean space. Leveraging these theoretical insights, we propose a novel DRL algorithm that tackles problems which have been previously intractable using conventional reinforcement learning approaches.
△ Less
Submitted 14 August, 2024;
originally announced August 2024.
-
Sparsely Pre-transformed Polar Codes for Low-Latency SCL Decoding
Authors:
Geon Choi,
Namyoon Lee
Abstract:
Deep polar codes, employing multi-layered polar kernel pre-transforms in series, are recently introduced variants of pre-transformed polar codes. These codes have demonstrated the ability to reduce the number of minimum weight codewords, thereby closely achieving finite-block length capacity with successive cancellation list (SCL) decoders in certain scenarios. However, when the list size of the S…
▽ More
Deep polar codes, employing multi-layered polar kernel pre-transforms in series, are recently introduced variants of pre-transformed polar codes. These codes have demonstrated the ability to reduce the number of minimum weight codewords, thereby closely achieving finite-block length capacity with successive cancellation list (SCL) decoders in certain scenarios. However, when the list size of the SCL decoder is small, which is crucial for low-latency communication applications, the reduction in the number of minimum weight codewords does not necessarily improve decoding performance. To address this limitation, we propose an alternative pre-transform technique to enhance the suitability of polar codes for SCL decoders with practical list sizes. Leveraging the fact that the SCL decoding error event set can be decomposed into two exclusive error event sets, our approach applies two different types of pre-transformations, each targeting the reduction of one of the two error event sets. Extensive simulation results under various block lengths and code rates have demonstrated that our codes consistently outperform all existing state-of-the-art pre-transformed polar codes, including CRC-aided polar codes and polarization-adjusted convolutional codes, when decoded using SCL decoders with small list sizes.
△ Less
Submitted 9 August, 2024;
originally announced August 2024.
-
Debiased Graph Poisoning Attack via Contrastive Surrogate Objective
Authors:
Kanghoon Yoon,
Yeonjun In,
Namkyeong Lee,
Kibum Kim,
Chanyoung Park
Abstract:
Graph neural networks (GNN) are vulnerable to adversarial attacks, which aim to degrade the performance of GNNs through imperceptible changes on the graph. However, we find that in fact the prevalent meta-gradient-based attacks, which utilizes the gradient of the loss w.r.t the adjacency matrix, are biased towards training nodes. That is, their meta-gradient is determined by a training procedure o…
▽ More
Graph neural networks (GNN) are vulnerable to adversarial attacks, which aim to degrade the performance of GNNs through imperceptible changes on the graph. However, we find that in fact the prevalent meta-gradient-based attacks, which utilizes the gradient of the loss w.r.t the adjacency matrix, are biased towards training nodes. That is, their meta-gradient is determined by a training procedure of the surrogate model, which is solely trained on the training nodes. This bias manifests as an uneven perturbation, connecting two nodes when at least one of them is a labeled node, i.e., training node, while it is unlikely to connect two unlabeled nodes. However, these biased attack approaches are sub-optimal as they do not consider flipping edges between two unlabeled nodes at all. This means that they miss the potential attacked edges between unlabeled nodes that significantly alter the representation of a node. In this paper, we investigate the meta-gradients to uncover the root cause of the uneven perturbations of existing attacks. Based on our analysis, we propose a Meta-gradient-based attack method using contrastive surrogate objective (Metacon), which alleviates the bias in meta-gradient using a new surrogate loss. We conduct extensive experiments to show that Metacon outperforms existing meta gradient-based attack methods through benchmark datasets, while showing that alleviating the bias towards training nodes is effective in attacking the graph structure.
△ Less
Submitted 26 July, 2024;
originally announced July 2024.
-
MoXIchecker: An Extensible Model Checker for MoXI
Authors:
Salih Ates,
Dirk Beyer,
Po-Chun Chien,
Nian-Ze Lee
Abstract:
MoXI is a new intermediate verification language introduced in 2024 to promote the standardization and open-source implementations for symbolic model checking by extending the SMT-LIB 2 language with constructs to define state-transition systems. The tool suite of MoXI provides a translator from MoXI to Btor2, which is a lower-level intermediate language for hardware verification, and a translatio…
▽ More
MoXI is a new intermediate verification language introduced in 2024 to promote the standardization and open-source implementations for symbolic model checking by extending the SMT-LIB 2 language with constructs to define state-transition systems. The tool suite of MoXI provides a translator from MoXI to Btor2, which is a lower-level intermediate language for hardware verification, and a translation-based model checker, which invokes mature hardware model checkers for Btor2 to analyze the translated verification tasks. The extensibility of such a translation-based model checker is restricted because more complex theories, such as integer or real arithmetics, cannot be precisely expressed with bit-vectors of fixed lengths in Btor2. We present MoXIchecker, the first model checker that solves MoXI verification tasks directly. Instead of translating MoXI to lower-level languages, MoXIchecker uses the solver-agnostic library PySMT for SMT solvers as backend for its verification algorithms. MoXIchecker is extensible because it accommodates verification tasks involving more complex theories, not limited by lower-level languages, facilitates the implementation of new algorithms, and is solver-agnostic by using the API of PySMT. In our evaluation, MoXIchecker uniquely solved tasks that use integer or real arithmetics, and achieved a comparable performance against the translation-based model checker from the MoXI tool suite.
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
Consent in Crisis: The Rapid Decline of the AI Data Commons
Authors:
Shayne Longpre,
Robert Mahari,
Ariel Lee,
Campbell Lund,
Hamidah Oderinwale,
William Brannon,
Nayan Saxena,
Naana Obeng-Marnu,
Tobin South,
Cole Hunter,
Kevin Klyman,
Christopher Klamm,
Hailey Schoelkopf,
Nikhil Singh,
Manuel Cherep,
Ahmad Anis,
An Dinh,
Caroline Chitongo,
Da Yin,
Damien Sileo,
Deividas Mataciunas,
Diganta Misra,
Emad Alghamdi,
Enrico Shippole,
Jianguo Zhang
, et al. (24 additional authors not shown)
Abstract:
General-purpose artificial intelligence (AI) systems are built on massive swathes of public web data, assembled into corpora such as C4, RefinedWeb, and Dolma. To our knowledge, we conduct the first, large-scale, longitudinal audit of the consent protocols for the web domains underlying AI training corpora. Our audit of 14,000 web domains provides an expansive view of crawlable web data and how co…
▽ More
General-purpose artificial intelligence (AI) systems are built on massive swathes of public web data, assembled into corpora such as C4, RefinedWeb, and Dolma. To our knowledge, we conduct the first, large-scale, longitudinal audit of the consent protocols for the web domains underlying AI training corpora. Our audit of 14,000 web domains provides an expansive view of crawlable web data and how codified data use preferences are changing over time. We observe a proliferation of AI-specific clauses to limit use, acute differences in restrictions on AI developers, as well as general inconsistencies between websites' expressed intentions in their Terms of Service and their robots.txt. We diagnose these as symptoms of ineffective web protocols, not designed to cope with the widespread re-purposing of the internet for AI. Our longitudinal analyses show that in a single year (2023-2024) there has been a rapid crescendo of data restrictions from web sources, rendering ~5%+ of all tokens in C4, or 28%+ of the most actively maintained, critical sources in C4, fully restricted from use. For Terms of Service crawling restrictions, a full 45% of C4 is now restricted. If respected or enforced, these restrictions are rapidly biasing the diversity, freshness, and scaling laws for general-purpose AI systems. We hope to illustrate the emerging crises in data consent, for both developers and creators. The foreclosure of much of the open web will impact not only commercial AI, but also non-commercial AI and academic research.
△ Less
Submitted 24 July, 2024; v1 submitted 20 July, 2024;
originally announced July 2024.
-
Polylogarithmic functions with prescribed branching locus and linear relations between them
Authors:
Roman N. Lee
Abstract:
We consider the problem of finding the set of classical polylogarithmic functions $\text{Li}_n$ with branching locus determined by the solution of $p_1\cdot p_2\cdot \ldots \cdot p_n=0$, where $p_1,\ldots, p_n$ are irreducible polynomials of several variables. We present an algorithm of constructing a complete set of possible arguments of $\text{Li}_n$ functions. The corresponding Mathematica code…
▽ More
We consider the problem of finding the set of classical polylogarithmic functions $\text{Li}_n$ with branching locus determined by the solution of $p_1\cdot p_2\cdot \ldots \cdot p_n=0$, where $p_1,\ldots, p_n$ are irreducible polynomials of several variables. We present an algorithm of constructing a complete set of possible arguments of $\text{Li}_n$ functions. The corresponding Mathematica code is included as ancillary file. Using this algorithm and the symbol map, we provide some examples of polylogarithmic identities.
△ Less
Submitted 17 July, 2024;
originally announced July 2024.
-
3D Geometric Shape Assembly via Efficient Point Cloud Matching
Authors:
Nahyuk Lee,
Juhong Min,
Junha Lee,
Seungwook Kim,
Kanghee Lee,
Jaesik Park,
Minsu Cho
Abstract:
Learning to assemble geometric shapes into a larger target structure is a pivotal task in various practical applications. In this work, we tackle this problem by establishing local correspondences between point clouds of part shapes in both coarse- and fine-levels. To this end, we introduce Proxy Match Transform (PMT), an approximate high-order feature transform layer that enables reliable matchin…
▽ More
Learning to assemble geometric shapes into a larger target structure is a pivotal task in various practical applications. In this work, we tackle this problem by establishing local correspondences between point clouds of part shapes in both coarse- and fine-levels. To this end, we introduce Proxy Match Transform (PMT), an approximate high-order feature transform layer that enables reliable matching between mating surfaces of parts while incurring low costs in memory and computation. Building upon PMT, we introduce a new framework, dubbed Proxy Match TransformeR (PMTR), for the geometric assembly task. We evaluate the proposed PMTR on the large-scale 3D geometric shape assembly benchmark dataset of Breaking Bad and demonstrate its superior performance and efficiency compared to state-of-the-art methods. Project page: https://nahyuklee.github.io/pmtr.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Multibeam Satellite Communications with Massive MIMO: Asymptotic Performance Analysis and Design Insights
Authors:
Seyong Kim,
Jinseok Choi,
Wonjae Shin,
Namyoon Lee,
Jeonghun Park
Abstract:
To achieve high performance without substantial overheads associated with channel state information (CSI) of ground users, we consider a fixed-beam precoding approach, where a satellite forms multiple fixed-beams without relying on CSI, then select a suitable user set for each beam. Upon this precoding method, we put forth a satellite equipped with massive multiple-input multiple-output (MIMO), by…
▽ More
To achieve high performance without substantial overheads associated with channel state information (CSI) of ground users, we consider a fixed-beam precoding approach, where a satellite forms multiple fixed-beams without relying on CSI, then select a suitable user set for each beam. Upon this precoding method, we put forth a satellite equipped with massive multiple-input multiple-output (MIMO), by which inter-beam interference is efficiently mitigated by narrowing corresponding beam width. By modeling the ground users' locations via a Poisson point process, we rigorously analyze the achievable performance of the presented multibeam satellite system. In particular, we investigate the asymptotic scaling laws that reveal the interplay between the user density, the number of beams, and the number of antennas. Our analysis offers critical design insights for the multibeam satellite with massive MIMO: i) If the user density scales in power with the number of antennas, the considered precoding can achieve a linear fraction of the optimal rate in the asymptotic regime. ii) A certain additional scaling factor for the user density is needed as the number of beams increases to maintain the asymptotic optimality.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Vision Language Model is NOT All You Need: Augmentation Strategies for Molecule Language Models
Authors:
Namkyeong Lee,
Siddhartha Laghuvarapu,
Chanyoung Park,
Jimeng Sun
Abstract:
Recently, there has been a growing interest among researchers in understanding molecules and their textual descriptions through molecule language models (MoLM). However, despite some early promising developments, the advancement of MoLM still trails significantly behind that of vision language models (VLM). This is because unique challenges exist apart from VLM in the field of MoLM due to 1) a lim…
▽ More
Recently, there has been a growing interest among researchers in understanding molecules and their textual descriptions through molecule language models (MoLM). However, despite some early promising developments, the advancement of MoLM still trails significantly behind that of vision language models (VLM). This is because unique challenges exist apart from VLM in the field of MoLM due to 1) a limited amount of molecule-text paired data and 2) missing expertise that occurred due to the specialized areas of focus among the experts. To this end, we propose AMOLE, which 1) augments molecule-text pairs with structural similarity preserving loss, and 2) transfers the expertise between the molecules. Specifically, AMOLE enriches molecule-text pairs by sharing descriptions among structurally similar molecules with a novel structural similarity preserving loss. Moreover, we propose an expertise reconstruction loss to transfer knowledge from molecules that have extensive expertise to those with less expertise. Extensive experiments on various downstream tasks demonstrate the superiority of AMOLE in comprehending molecules and their descriptions, highlighting its potential for application in real-world drug discovery. The source code for AMOLE is available at https://github.com/Namkyeong/AMOLE.
△ Less
Submitted 23 July, 2024; v1 submitted 12 July, 2024;
originally announced July 2024.
-
Rethinking Pruning Large Language Models: Benefits and Pitfalls of Reconstruction Error Minimization
Authors:
Sungbin Shin,
Wonpyo Park,
Jaeho Lee,
Namhoon Lee
Abstract:
This work suggests fundamentally rethinking the current practice of pruning large language models (LLMs). The way it is done is by divide and conquer: split the model into submodels, sequentially prune them, and reconstruct predictions of the dense counterparts on small calibration data one at a time; the final model is obtained simply by putting the resulting sparse submodels together. While this…
▽ More
This work suggests fundamentally rethinking the current practice of pruning large language models (LLMs). The way it is done is by divide and conquer: split the model into submodels, sequentially prune them, and reconstruct predictions of the dense counterparts on small calibration data one at a time; the final model is obtained simply by putting the resulting sparse submodels together. While this approach enables pruning under memory constraints, it generates high reconstruction errors. In this work, we first present an array of reconstruction techniques that can significantly reduce this error by more than $90\%$. Unwittingly, however, we discover that minimizing reconstruction error is not always ideal and can overfit the given calibration data, resulting in rather increased language perplexity and poor performance at downstream tasks. We find out that a strategy of self-generating calibration data can mitigate this trade-off between reconstruction and generalization, suggesting new directions in the presence of both benefits and pitfalls of reconstruction for pruning LLMs.
△ Less
Submitted 10 October, 2024; v1 submitted 21 June, 2024;
originally announced June 2024.
-
BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages
Authors:
Junho Myung,
Nayeon Lee,
Yi Zhou,
Jiho Jin,
Rifki Afina Putri,
Dimosthenis Antypas,
Hsuvas Borkakoty,
Eunsu Kim,
Carla Perez-Almendros,
Abinew Ali Ayele,
Víctor Gutiérrez-Basulto,
Yazmín Ibáñez-García,
Hwaran Lee,
Shamsuddeen Hassan Muhammad,
Kiwoong Park,
Anar Sabuhi Rzayev,
Nina White,
Seid Muhie Yimam,
Mohammad Taher Pilehvar,
Nedjma Ousidhoum,
Jose Camacho-Collados,
Alice Oh
Abstract:
Large language models (LLMs) often lack culture-specific knowledge of daily life, especially across diverse regions and non-English languages. Existing benchmarks for evaluating LLMs' cultural sensitivities are limited to a single language or collected from online sources such as Wikipedia, which do not reflect the mundane everyday lifestyles of diverse regions. That is, information about the food…
▽ More
Large language models (LLMs) often lack culture-specific knowledge of daily life, especially across diverse regions and non-English languages. Existing benchmarks for evaluating LLMs' cultural sensitivities are limited to a single language or collected from online sources such as Wikipedia, which do not reflect the mundane everyday lifestyles of diverse regions. That is, information about the food people eat for their birthday celebrations, spices they typically use, musical instruments youngsters play, or the sports they practice in school is common cultural knowledge but uncommon in easily collected online sources, especially for underrepresented cultures. To address this issue, we introduce BLEnD, a hand-crafted benchmark designed to evaluate LLMs' everyday knowledge across diverse cultures and languages. BLEnD comprises 52.6k question-answer pairs from 16 countries/regions, in 13 different languages, including low-resource ones such as Amharic, Assamese, Azerbaijani, Hausa, and Sundanese. We construct the benchmark to include two formats of questions: short-answer and multiple-choice. We show that LLMs perform better for cultures that are highly represented online, with a maximum 57.34% difference in GPT-4, the best-performing model, in the short-answer format. For cultures represented by mid-to-high-resource languages, LLMs perform better in their local languages, but for cultures represented by low-resource languages, LLMs perform better in English than the local languages. We make our dataset publicly available at: https://github.com/nlee0212/BLEnD.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Margin-aware Preference Optimization for Aligning Diffusion Models without Reference
Authors:
Jiwoo Hong,
Sayak Paul,
Noah Lee,
Kashif Rasul,
James Thorne,
Jongheon Jeong
Abstract:
Modern alignment techniques based on human preferences, such as RLHF and DPO, typically employ divergence regularization relative to the reference model to ensure training stability. However, this often limits the flexibility of models during alignment, especially when there is a clear distributional discrepancy between the preference data and the reference model. In this paper, we focus on the al…
▽ More
Modern alignment techniques based on human preferences, such as RLHF and DPO, typically employ divergence regularization relative to the reference model to ensure training stability. However, this often limits the flexibility of models during alignment, especially when there is a clear distributional discrepancy between the preference data and the reference model. In this paper, we focus on the alignment of recent text-to-image diffusion models, such as Stable Diffusion XL (SDXL), and find that this "reference mismatch" is indeed a significant problem in aligning these models due to the unstructured nature of visual modalities: e.g., a preference for a particular stylistic aspect can easily induce such a discrepancy. Motivated by this observation, we propose a novel and memory-friendly preference alignment method for diffusion models that does not depend on any reference model, coined margin-aware preference optimization (MaPO). MaPO jointly maximizes the likelihood margin between the preferred and dispreferred image sets and the likelihood of the preferred sets, simultaneously learning general stylistic features and preferences. For evaluation, we introduce two new pairwise preference datasets, which comprise self-generated image pairs from SDXL, Pick-Style and Pick-Safety, simulating diverse scenarios of reference mismatch. Our experiments validate that MaPO can significantly improve alignment on Pick-Style and Pick-Safety and general preference alignment when used with Pick-a-Pic v2, surpassing the base SDXL and other existing methods. Our code, models, and datasets are publicly available via https://mapo-t2i.github.io
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Authors:
Seungone Kim,
Juyoung Suk,
Ji Yong Cho,
Shayne Longpre,
Chaeeun Kim,
Dongkeun Yoon,
Guijin Son,
Yejin Cho,
Sheikh Shafayat,
Jinheon Baek,
Sue Hyun Park,
Hyeonbin Hwang,
Jinkyung Jo,
Hyowon Cho,
Haebin Shin,
Seongyun Lee,
Hanseok Oh,
Noah Lee,
Namgyu Ho,
Se June Joo,
Miyoung Ko,
Yoonjoo Lee,
Hyungjoo Chae,
Jamin Shin,
Joel Jang
, et al. (7 additional authors not shown)
Abstract:
As language models (LMs) become capable of handling a wide range of tasks, their evaluation is becoming as challenging as their development. Most generation benchmarks currently assess LMs using abstract evaluation criteria like helpfulness and harmlessness, which often lack the flexibility and granularity of human assessment. Additionally, these benchmarks tend to focus disproportionately on spec…
▽ More
As language models (LMs) become capable of handling a wide range of tasks, their evaluation is becoming as challenging as their development. Most generation benchmarks currently assess LMs using abstract evaluation criteria like helpfulness and harmlessness, which often lack the flexibility and granularity of human assessment. Additionally, these benchmarks tend to focus disproportionately on specific capabilities such as instruction following, leading to coverage bias. To overcome these limitations, we introduce the BiGGen Bench, a principled generation benchmark designed to thoroughly evaluate nine distinct capabilities of LMs across 77 diverse tasks. A key feature of the BiGGen Bench is its use of instance-specific evaluation criteria, closely mirroring the nuanced discernment of human evaluation. We apply this benchmark to assess 103 frontier LMs using five evaluator LMs. Our code, data, and evaluation results are all publicly available at https://github.com/prometheus-eval/prometheus-eval/tree/main/BiGGen-Bench.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Dimers for Type D Relativistic Toda Model
Authors:
Kimyeong Lee,
Norton Lee
Abstract:
We construct dimer graphs for type D relativistic Toda models by introducing impurities to the $Y^{2N,0}$ square dimer graphs. By properly placing the impurities and change of canonical variables assigned to the 1-loops on the dimer graph, we introduce the "folding" of the graphs and get the type D relativistic Toda lattice Hamiltonian and monodromy matrix.
We construct dimer graphs for type D relativistic Toda models by introducing impurities to the $Y^{2N,0}$ square dimer graphs. By properly placing the impurities and change of canonical variables assigned to the 1-loops on the dimer graph, we introduce the "folding" of the graphs and get the type D relativistic Toda lattice Hamiltonian and monodromy matrix.
△ Less
Submitted 2 September, 2024; v1 submitted 2 June, 2024;
originally announced June 2024.
-
FDD Massive MIMO: How to Optimally Combine UL Pilot and Limited DL CSI Feedback?
Authors:
Jungyeon Kim,
Jinseok Choi,
Jeonghun Park,
Ahmed Alkhateeb,
Namyoon Lee
Abstract:
In frequency-division duplexing (FDD) multiple-input multiple-output (MIMO) systems, obtaining accurate downlink channel state information (CSI) for precoding is vastly challenging due to the tremendous feedback overhead with the growing number of antennas. Utilizing uplink pilots for downlink CSI estimation is a promising approach that can eliminate CSI feedback. However, the downlink CSI estimat…
▽ More
In frequency-division duplexing (FDD) multiple-input multiple-output (MIMO) systems, obtaining accurate downlink channel state information (CSI) for precoding is vastly challenging due to the tremendous feedback overhead with the growing number of antennas. Utilizing uplink pilots for downlink CSI estimation is a promising approach that can eliminate CSI feedback. However, the downlink CSI estimation accuracy diminishes significantly as the number of channel paths increases, resulting in reduced spectral efficiency. In this paper, we demonstrate that achieving downlink spectral efficiency comparable to perfect CSI is feasible by combining uplink CSI with limited downlink CSI feedback information. Our proposed downlink CSI feedback strategy transmits quantized phase information of downlink channel paths, deviating from conventional limited methods. We put forth a mean square error (MSE)-optimal downlink channel reconstruction method by jointly exploiting the uplink CSI and the limited downlink CSI. Armed with the MSE-optimal estimator, we derive the MSE as a function of the number of feedback bits for phase quantization. Subsequently, we present an optimal feedback bit allocation method for minimizing the MSE in the reconstructed channel through phase quantization. Utilizing a robust downlink precoding technique, we establish that the proposed downlink channel reconstruction method is sufficient for attaining a sum-spectral efficiency comparable to perfect CSI.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
ECC Analyzer: Extract Trading Signal from Earnings Conference Calls using Large Language Model for Stock Performance Prediction
Authors:
Yupeng Cao,
Zhi Chen,
Qingyun Pei,
Nathan Jinseok Lee,
K. P. Subbalakshmi,
Papa Momar Ndiaye
Abstract:
In the realm of financial analytics, leveraging unstructured data, such as earnings conference calls (ECCs), to forecast stock volatility is a critical challenge that has attracted both academics and investors. While previous studies have used multimodal deep learning-based models to obtain a general view of ECCs for volatility predicting, they often fail to capture detailed, complex information.…
▽ More
In the realm of financial analytics, leveraging unstructured data, such as earnings conference calls (ECCs), to forecast stock volatility is a critical challenge that has attracted both academics and investors. While previous studies have used multimodal deep learning-based models to obtain a general view of ECCs for volatility predicting, they often fail to capture detailed, complex information. Our research introduces a novel framework: \textbf{ECC Analyzer}, which utilizes large language models (LLMs) to extract richer, more predictive content from ECCs to aid the model's prediction performance. We use the pre-trained large models to extract textual and audio features from ECCs and implement a hierarchical information extraction strategy to extract more fine-grained information. This strategy first extracts paragraph-level general information by summarizing the text and then extracts fine-grained focus sentences using Retrieval-Augmented Generation (RAG). These features are then fused through multimodal feature fusion to perform volatility prediction. Experimental results demonstrate that our model outperforms traditional analytical benchmarks, confirming the effectiveness of advanced LLM techniques in financial analysis.
△ Less
Submitted 29 August, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.
-
A Bayesian Approach for Prioritising Driving Behaviour Investigations in Telematic Auto Insurance Policies
Authors:
Mark McLeod,
Bernardo Perez-Orozco,
Nika Lee,
Davide Zilli
Abstract:
Automotive insurers increasingly have access to telematic information via black-box recorders installed in the insured vehicle, and wish to identify undesirable behaviour which may signify increased risk or uninsured activities. However, identification of such behaviour with machine learning is non-trivial, and results are far from perfect, requiring human investigation to verify suspected cases.…
▽ More
Automotive insurers increasingly have access to telematic information via black-box recorders installed in the insured vehicle, and wish to identify undesirable behaviour which may signify increased risk or uninsured activities. However, identification of such behaviour with machine learning is non-trivial, and results are far from perfect, requiring human investigation to verify suspected cases. An appropriately formed priority score, generated by automated analysis of GPS data, allows underwriters to make more efficient use of their time, improving detection of the behaviour under investigation.
An example of such behaviour is the use of a privately insured vehicle for commercial purposes, such as delivering meals and parcels. We first make use of trip GPS and accelerometer data, augmented by geospatial information, to train an imperfect classifier for delivery driving on a per-trip basis. We make use of a mixture of Beta-Binomial distributions to model the propensity of a policyholder to undertake trips which result in a positive classification as being drawn from either a rare high-scoring or common low-scoring group, and learn the parameters of this model using MCMC. This model provides us with a posterior probability that any policyholder will be a regular generator of automated alerts given any number of trips and alerts. This posterior probability is converted to a priority score, which was used to select the most valuable candidates for manual investigation.
Testing over a 1-year period ranked policyholders by likelihood of commercial driving activity on a weekly basis. The top 0.9% have been reviewed at least once by the underwriters at the time of writing, and of those 99.4% have been confirmed as correctly identified, showing the approach has achieved a significant improvement in efficiency of human resource allocation compared to manual searching.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
NNLO QCD corrections to polarized semi-inclusive DIS
Authors:
Saurav Goyal,
Roman N. Lee,
Sven-Olaf Moch,
Vaibhav Pathak,
Narayan Rana,
V. Ravindran
Abstract:
Polarized semi-inclusive deep-inelastic scattering (SIDIS) is a key process in the quest for a resolution of the proton spin puzzle. We present the complete results for the polarized SIDIS process at next-to-next-to-leading order (NNLO) in perturbative quantum chromodynamics. Our analytical results include all partonic channels for the scattering of polarized leptons off hadrons and a spin-average…
▽ More
Polarized semi-inclusive deep-inelastic scattering (SIDIS) is a key process in the quest for a resolution of the proton spin puzzle. We present the complete results for the polarized SIDIS process at next-to-next-to-leading order (NNLO) in perturbative quantum chromodynamics. Our analytical results include all partonic channels for the scattering of polarized leptons off hadrons and a spin-averaged hadron identified in the final state. A numerical analysis of the NNLO corrections illustrates their significance and the reduced residual scale dependence in the kinematic range probed by the future Electron-Ion-Collider EIC.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Magnetic fields from small-scale primordial perturbations
Authors:
Nanoom Lee,
Yacine Ali-Haimoud
Abstract:
Weak magnetic fields must have existed in the early Universe, as they were sourced by the cross product of electron density and temperature gradients through the Biermann-battery mechanism. In this paper we calculate the magnetic fields generated at cosmic dawn by a variety of small-scale primordial perturbations, carefully computing the evolution of electron density and temperature fluctuations,…
▽ More
Weak magnetic fields must have existed in the early Universe, as they were sourced by the cross product of electron density and temperature gradients through the Biermann-battery mechanism. In this paper we calculate the magnetic fields generated at cosmic dawn by a variety of small-scale primordial perturbations, carefully computing the evolution of electron density and temperature fluctuations, and consistently accounting for relative velocities between baryons and dark matter. We first compute the magnetic field resulting from standard, nearly scale-invariant primordial adiabatic perturbations, making significant improvements to previous calculations. This "standard" primordial field has a root mean square (rms) of $\sim10^{-15}$ nG at $20\lesssim z \lesssim 100$, with fluctuations on $\sim$ kpc comoving scales, and could serve as the seed of present-day magnetic fields observed in galaxies and galaxy clusters. In addition, we consider early-Universe magnetic fields as a possible probe of non-standard initial conditions of the Universe on small scales $k \sim 1-10^3$ Mpc$^{-1}$. To this end, we compute the maximally-allowed magnetic fields within current upper limits on small-scale adiabatic and isocurvature perturbations. Under the current Cosmic Microwave Background spectral-distortion constraints magnetic fields could be produced with a rms of $\sim 5\times 10^{-11}$ nG at $z = 20$. Uncorrelated small-scale isocurvature perturbations within current Big-Bang Nucleosynthesis bounds could potentially enhance the magnetic field to $\sim 10^{-14}-10^{-10}$ nG at $z = 20$, depending on the specific isocurvature mode considered. While these very weak fields remain well below current observational capabilities, our work points out that magnetic fields could potentially provide an interesting window into the poorly constrained small-scale initial conditions of the Universe.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
HyperCLOVA X Technical Report
Authors:
Kang Min Yoo,
Jaegeun Han,
Sookyo In,
Heewon Jeon,
Jisu Jeong,
Jaewook Kang,
Hyunwook Kim,
Kyung-Min Kim,
Munhyong Kim,
Sungju Kim,
Donghyun Kwak,
Hanock Kwak,
Se Jung Kwon,
Bado Lee,
Dongsoo Lee,
Gichang Lee,
Jooho Lee,
Baeseong Park,
Seongjin Shin,
Joonsang Yu,
Seolki Baek,
Sumin Byeon,
Eungsup Cho,
Dooseok Choe,
Jeesung Han
, et al. (371 additional authors not shown)
Abstract:
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t…
▽ More
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs.
△ Less
Submitted 13 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Generalized Calogero-Moser system and supergroup gauge origami
Authors:
Taro Kimura,
Norton Lee
Abstract:
We study the integrability and the Bethe/Gauge correspondence of the Generalized Calogero-Moser system proposed by Berntson, Langmann and Lenells which we call the elliptic quadruple Calogero-Moser system (eqCM). We write down the Dunkl operators which give commuting Hamiltonians of the quantum integrable system. We identify the gauge theory in correspondence is a supergroup version of the gauge o…
▽ More
We study the integrability and the Bethe/Gauge correspondence of the Generalized Calogero-Moser system proposed by Berntson, Langmann and Lenells which we call the elliptic quadruple Calogero-Moser system (eqCM). We write down the Dunkl operators which give commuting Hamiltonians of the quantum integrable system. We identify the gauge theory in correspondence is a supergroup version of the gauge origami, from which we construct the transfer matrix of the eqCM system.
△ Less
Submitted 30 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Measuring Political Bias in Large Language Models: What Is Said and How It Is Said
Authors:
Yejin Bang,
Delong Chen,
Nayeon Lee,
Pascale Fung
Abstract:
We propose to measure political bias in LLMs by analyzing both the content and style of their generated content regarding political issues. Existing benchmarks and measures focus on gender and racial biases. However, political bias exists in LLMs and can lead to polarization and other harms in downstream applications. In order to provide transparency to users, we advocate that there should be fine…
▽ More
We propose to measure political bias in LLMs by analyzing both the content and style of their generated content regarding political issues. Existing benchmarks and measures focus on gender and racial biases. However, political bias exists in LLMs and can lead to polarization and other harms in downstream applications. In order to provide transparency to users, we advocate that there should be fine-grained and explainable measures of political biases generated by LLMs. Our proposed measure looks at different political issues such as reproductive rights and climate change, at both the content (the substance of the generation) and the style (the lexical polarity) of such bias. We measured the political bias in eleven open-sourced LLMs and showed that our proposed framework is easily scalable to other topics and is explainable.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
SignSGD with Federated Voting
Authors:
Chanho Park,
H. Vincent Poor,
Namyoon Lee
Abstract:
Distributed learning is commonly used for accelerating model training by harnessing the computational capabilities of multiple-edge devices. However, in practical applications, the communication delay emerges as a bottleneck due to the substantial information exchange required between workers and a central parameter server. SignSGD with majority voting (signSGD-MV) is an effective distributed lear…
▽ More
Distributed learning is commonly used for accelerating model training by harnessing the computational capabilities of multiple-edge devices. However, in practical applications, the communication delay emerges as a bottleneck due to the substantial information exchange required between workers and a central parameter server. SignSGD with majority voting (signSGD-MV) is an effective distributed learning algorithm that can significantly reduce communication costs by one-bit quantization. However, due to heterogeneous computational capabilities, it fails to converge when the mini-batch sizes differ among workers. To overcome this, we propose a novel signSGD optimizer with \textit{federated voting} (signSGD-FV). The idea of federated voting is to exploit learnable weights to perform weighted majority voting. The server learns the weights assigned to the edge devices in an online fashion based on their computational capabilities. Subsequently, these weights are employed to decode the signs of the aggregated local gradients in such a way to minimize the sign decoding error probability. We provide a unified convergence rate analysis framework applicable to scenarios where the estimated weights are known to the parameter server either perfectly or imperfectly. We demonstrate that the proposed signSGD-FV algorithm has a theoretical convergence guarantee even when edge devices use heterogeneous mini-batch sizes. Experimental results show that signSGD-FV outperforms signSGD-MV, exhibiting a faster convergence rate, especially in heterogeneous mini-batch sizes.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Block Orthogonal Sparse Superposition Codes for $ \sf{L}^3 $ Communications: Low Error Rate, Low Latency, and Low Power Consumption
Authors:
Donghwa Han,
Bowhyung Lee,
Min Jang,
Donghun Lee,
Seho Myung,
Namyoon Lee
Abstract:
Block orthogonal sparse superposition (BOSS) code is a class of joint coded modulation methods, which can closely achieve the finite-blocklength capacity with a low-complexity decoder at a few coding rates under Gaussian channels. However, for fading channels, the code performance degrades considerably because coded symbols experience different channel fading effects. In this paper, we put forth n…
▽ More
Block orthogonal sparse superposition (BOSS) code is a class of joint coded modulation methods, which can closely achieve the finite-blocklength capacity with a low-complexity decoder at a few coding rates under Gaussian channels. However, for fading channels, the code performance degrades considerably because coded symbols experience different channel fading effects. In this paper, we put forth novel joint demodulation and decoding methods for BOSS codes under fading channels. For a fast fading channel, we present a minimum mean square error approximate maximum a posteriori (MMSE-A-MAP) algorithm for the joint demodulation and decoding when channel state information is available at the receiver (CSIR). We also propose a joint demodulation and decoding method without using CSIR for a block fading channel scenario. We refer to this as the non-coherent sphere decoding (NSD) algorithm. Simulation results demonstrate that BOSS codes with MMSE-A-MAP decoding outperform CRC-aided polar codes, while NSD decoding achieves comparable performance to quasi-maximum likelihood decoding with significantly reduced complexity. Both decoding algorithms are suitable for parallelization, satisfying low-latency constraints. Additionally, real-time simulations on a software-defined radio testbed validate the feasibility of using BOSS codes for low-power transmission.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Authors:
Nicholas Lee,
Thanakul Wattanawong,
Sehoon Kim,
Karttikeya Mangalam,
Sheng Shen,
Gopala Anumanchipalli,
Michael W. Mahoney,
Kurt Keutzer,
Amir Gholami
Abstract:
Pretrained large language models (LLMs) are currently state-of-the-art for solving the vast majority of natural language processing tasks. While many real-world applications still require fine-tuning to reach satisfactory levels of performance, many of them are in the low-data regime, making fine-tuning challenging. To address this, we propose LLM2LLM, a targeted and iterative data augmentation st…
▽ More
Pretrained large language models (LLMs) are currently state-of-the-art for solving the vast majority of natural language processing tasks. While many real-world applications still require fine-tuning to reach satisfactory levels of performance, many of them are in the low-data regime, making fine-tuning challenging. To address this, we propose LLM2LLM, a targeted and iterative data augmentation strategy that uses a teacher LLM to enhance a small seed dataset by augmenting additional data that can be used for fine-tuning on a specific task. LLM2LLM (1) fine-tunes a baseline student LLM on the initial seed data, (2) evaluates and extracts data points that the model gets wrong, and (3) uses a teacher LLM to generate synthetic data based on these incorrect data points, which are then added back into the training data. This approach amplifies the signal from incorrectly predicted data points by the LLM during training and reintegrates them into the dataset to focus on more challenging examples for the LLM. Our results show that LLM2LLM significantly enhances the performance of LLMs in the low-data regime, outperforming both traditional fine-tuning and other data augmentation baselines. LLM2LLM reduces the dependence on labor-intensive data curation and paves the way for more scalable and performant LLM solutions, allowing us to tackle data-constrained domains and tasks. We achieve improvements up to 24.2% on the GSM8K dataset, 32.6% on CaseHOLD, 32.0% on SNIPS, 52.6% on TREC and 39.8% on SST-2 over regular fine-tuning in the low-data regime using a Llama-2-7B student model. Our code is available at https://github.com/SqueezeAILab/LLM2LLM .
△ Less
Submitted 13 July, 2024; v1 submitted 22 March, 2024;
originally announced March 2024.
-
Full-Duplex MU-MIMO Systems with Coarse Quantization: How Many Bits Do We Need?
Authors:
Seunghyeong Yoo,
Seokjun Park,
Mintaek Oh,
Namyoon Lee,
Jinseok Choi
Abstract:
This paper investigates full-duplex (FD) multi-user multiple-input multiple-output (MU-MIMO) system design with coarse quantization. We first analyze the impact of self-interference (SI) on quantization in FD single-input single-output systems. The analysis elucidates that the minimum required number of analog-to-digital converter (ADC) bits is logarithmically proportional to the ratio of total re…
▽ More
This paper investigates full-duplex (FD) multi-user multiple-input multiple-output (MU-MIMO) system design with coarse quantization. We first analyze the impact of self-interference (SI) on quantization in FD single-input single-output systems. The analysis elucidates that the minimum required number of analog-to-digital converter (ADC) bits is logarithmically proportional to the ratio of total received power to the received power of desired signals. Motivated by this, we design a FD MIMO beamforming method that effectively manages the SI. Dividing a spectral efficiency maximization beamforming problem into two sub-problems for alternating optimization, we address the first by optimizing the precoder: obtaining a generalized eigenvalue problem from the first-order optimality condition, where the principal eigenvector is the optimal stationary solution, and adopting a power iteration method to identify this eigenvector. Subsequently, a quantization-aware minimum mean square error combiner is computed for the derived precoder. Through numerical studies, we observe that the proposed beamformer reduces the minimum required number of ADC bits for achieving higher spectral efficiency than that of half-duplex (HD) systems, compared to FD benchmarks. The overall analysis shows that, unlike with quantized HD systems, more than 6 bits are required for the ADC to fully realize the potential of the quantized FD system.
△ Less
Submitted 18 March, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
Modeling and Coverage Analysis of K-Tier Integrated Satellite-Terrestrial Downlink Networks
Authors:
Jungbin Yim,
Jeonghun Park,
Namyoon Lee
Abstract:
Integrated satellite-terrestrial networks (ISTNs) can significantly expand network coverage while diminishing reliance on terrestrial infrastructure. Despite the enticing potential of ISTNs, there is no comprehensive mathematical performance analysis framework for these emerging networks. In this paper, we introduce a tractable approach to analyze the downlink coverage performance of multi-tier IS…
▽ More
Integrated satellite-terrestrial networks (ISTNs) can significantly expand network coverage while diminishing reliance on terrestrial infrastructure. Despite the enticing potential of ISTNs, there is no comprehensive mathematical performance analysis framework for these emerging networks. In this paper, we introduce a tractable approach to analyze the downlink coverage performance of multi-tier ISTNs, where each network tier operates with orthogonal frequency bands. The proposed approach is to model the spatial distribution of cellular and satellite base stations using homogeneous Poisson point processes arranged on concentric spheres with varying radii. Central to our analysis is a displacement principle that transforms base station locations on different spheres into projected rings while preserving the distance distribution to the typical user. By incorporating the effects of Shadowed-Rician fading on satellite channels and employing orthogonal frequency bands, we derive analytical expressions for coverage in the integrated networks while keeping full generality. Our primary discovery is that network performance reaches its maximum when selecting the optimal density ratio of users associated with the network according to the density and the channel parameters of each network. Through simulations, we validate the precision of our derived expressions.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Nonlinear Self-Interference Cancellation With Learnable Orthonormal Polynomials for Full-Duplex Wireless Systems
Authors:
Hyowon Lee,
Jungyeon Kim,
Geon Choi,
Ian P. Roberts,
Jinseok Choi,
Namyoon Lee
Abstract:
Nonlinear self-interference cancellation (SIC) is essential for full-duplex communication systems, which can offer twice the spectral efficiency of traditional half-duplex systems. The challenge of nonlinear SIC is similar to the classic problem of system identification in adaptive filter theory, whose crux lies in identifying the optimal nonlinear basis functions for a nonlinear system. This beco…
▽ More
Nonlinear self-interference cancellation (SIC) is essential for full-duplex communication systems, which can offer twice the spectral efficiency of traditional half-duplex systems. The challenge of nonlinear SIC is similar to the classic problem of system identification in adaptive filter theory, whose crux lies in identifying the optimal nonlinear basis functions for a nonlinear system. This becomes especially difficult when the system input has a non-stationary distribution. In this paper, we propose a novel algorithm for nonlinear digital SIC that adaptively constructs orthonormal polynomial basis functions according to the non-stationary moments of the transmit signal. By combining these basis functions with the least mean squares (LMS) algorithm, we introduce a new SIC technique, called as the adaptive orthonormal polynomial LMS (AOP-LMS) algorithm. To reduce computational complexity for practical systems, we augment our approach with a precomputed look-up table, which maps a given modulation and coding scheme to its corresponding basis functions. Numerical simulation indicates that our proposed method surpasses existing state-of-the-art SIC algorithms in terms of convergence speed and mean squared error when the transmit signal is non-stationary, such as with adaptive modulation and coding. Experimental evaluation with a wireless testbed confirms that our proposed approach outperforms existing digital SIC algorithms.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Augmenting Interpolation-Based Model Checking with Auxiliary Invariants (Extended Version)
Authors:
Dirk Beyer,
Po-Chun Chien,
Nian-Ze Lee
Abstract:
Software model checking is a challenging problem, and generating relevant invariants is a key factor in proving the safety properties of a program. Program invariants can be obtained by various approaches, including lightweight procedures based on data-flow analysis and intensive techniques using Craig interpolation. Although data-flow analysis runs efficiently, it often produces invariants that a…
▽ More
Software model checking is a challenging problem, and generating relevant invariants is a key factor in proving the safety properties of a program. Program invariants can be obtained by various approaches, including lightweight procedures based on data-flow analysis and intensive techniques using Craig interpolation. Although data-flow analysis runs efficiently, it often produces invariants that are too weak to prove the properties. By contrast, interpolation-based approaches build strong invariants from interpolants, but they might not scale well due to expensive interpolation procedures. Invariants can also be injected into model-checking algorithms to assist the analysis. Invariant injection has been studied for many well-known approaches, including k-induction, predicate abstraction, and symbolic execution. We propose an augmented interpolation-based verification algorithm that injects external invariants into interpolation-based model checking (McMillan, 2003), a hardware model-checking algorithm recently adopted for software verification. The auxiliary invariants help prune unreachable states in Craig interpolants and confine the analysis to the reachable parts of a program. We implemented the proposed technique in the verification framework CPAchecker and evaluated it against mature SMT-based methods in CPAchecker as well as other state-of-the-art software verifiers. We found that injecting invariants reduces the number of interpolation queries needed to prove safety properties and improves the run-time efficiency. Consequently, the proposed invariant-injection approach verified difficult tasks that none of its plain version (i.e., without invariants), the invariant generator, or any compared tools could solve.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
ORPO: Monolithic Preference Optimization without Reference Model
Authors:
Jiwoo Hong,
Noah Lee,
James Thorne
Abstract:
While recent preference alignment algorithms for language models have demonstrated promising results, supervised fine-tuning (SFT) remains imperative for achieving successful convergence. In this paper, we study the crucial role of SFT within the context of preference alignment, emphasizing that a minor penalty for the disfavored generation style is sufficient for preference-aligned SFT. Building…
▽ More
While recent preference alignment algorithms for language models have demonstrated promising results, supervised fine-tuning (SFT) remains imperative for achieving successful convergence. In this paper, we study the crucial role of SFT within the context of preference alignment, emphasizing that a minor penalty for the disfavored generation style is sufficient for preference-aligned SFT. Building on this foundation, we introduce a straightforward and innovative reference model-free monolithic odds ratio preference optimization algorithm, ORPO, eliminating the necessity for an additional preference alignment phase. We demonstrate, both empirically and theoretically, that the odds ratio is a sensible choice for contrasting favored and disfavored styles during SFT across the diverse sizes from 125M to 7B. Specifically, fine-tuning Phi-2 (2.7B), Llama-2 (7B), and Mistral (7B) with ORPO on the UltraFeedback alone surpasses the performance of state-of-the-art language models with more than 7B and 13B parameters: achieving up to 12.20% on $\text{AlpacaEval}_{2.0}$ (Figure 1), 66.19% on IFEval (instruction-level loose, Table 6), and 7.32 in MT-Bench (Figure 12). We release code and model checkpoints for Mistral-ORPO-$α$ (7B) and Mistral-ORPO-$β$ (7B).
△ Less
Submitted 14 March, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
Multi-reference coupled cluster theory using the normal ordered exponential ansatz
Authors:
Alexander Gunasekera,
Nicholas Lee,
David P. Tew
Abstract:
Properly spin-adapted coupled-cluster theory for general open-shell configurations remains an active area of research in electronic structure theory. In this contribution we examine Lindgren's normal-ordered exponential ansatz to correlate specific spin states using spin-free excitation operators, with the aid of automatic equation generation software. We present an intermediately normalised and s…
▽ More
Properly spin-adapted coupled-cluster theory for general open-shell configurations remains an active area of research in electronic structure theory. In this contribution we examine Lindgren's normal-ordered exponential ansatz to correlate specific spin states using spin-free excitation operators, with the aid of automatic equation generation software. We present an intermediately normalised and size-extensive reformulation of the unlinked working equations, and analyse the performance of the method with single and double excitations for simple molecular systems in terms of accuracy and size-consistency.
△ Less
Submitted 30 September, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Bispectral duality and separation of variables from surface defect transition
Authors:
Saebyeok Jeong,
Norton Lee
Abstract:
We study two types of surface observables $-$ the $\mathbf{Q}$-observables and the $\mathbf{H}$-observables $-$ of the 4d $\mathcal{N}=2$ $A_1$-quiver $U(N)$ gauge theory obtained by coupling a 2d $\mathcal{N}=(2,2)$ gauged linear sigma model. We demonstrate that the transition between the two surface defects manifests as a Fourier transformation between the surface observables. Utilizing the resu…
▽ More
We study two types of surface observables $-$ the $\mathbf{Q}$-observables and the $\mathbf{H}$-observables $-$ of the 4d $\mathcal{N}=2$ $A_1$-quiver $U(N)$ gauge theory obtained by coupling a 2d $\mathcal{N}=(2,2)$ gauged linear sigma model. We demonstrate that the transition between the two surface defects manifests as a Fourier transformation between the surface observables. Utilizing the results from our previous works, which establish that the $\mathbf{Q}$-observables and the $\mathbf{H}$-observables give rise, respectively, to the $Q$-operators on the evaluation module over the Yangian $Y(\mathfrak{gl}(2))$ and the Hecke operators on the twisted $\widehat{\mathfrak{sl}}(N)$-coinvariants, we derive an exact duality between the spectral problems of the $\mathfrak{gl}(2)$ XXX spin chain with $N$ sites and the $\mathfrak{sl}(N)$ Gaudin model with 4 sites, both of which are defined on bi-infinite modules. Moreover, we present a dual description of the monodromy surface defect as coupling a 2d $\mathcal{N}=(2,2)$ gauged linear sigma model. Employing this dual perspective, we demonstrate how the monodromy surface defect undergoes a transition to multiple $\mathbf{Q}$-observables or $\mathbf{H}$-observables, implemented through integral transformations between their surface observables. These transformations provide, respectively, $\hbar$-deformation and a higher-rank generalization of the KZ/BPZ correspondence. In the limit $\varepsilon_2\to 0$, they give rise to the quantum separation of variables for the $\mathfrak{gl}(2)$ XXX spin chain and the $\mathfrak{sl}(N)$ Gaudin model, respectively.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
di-Langlands correspondence and extended observables
Authors:
Saebyeok Jeong,
Norton Lee,
Nikita Nekrasov
Abstract:
We explore the $\textit{difference Langlands correspondence}$ using the four dimensional ${\mathcal{N}}=2$ super-QCD. Surface defects and surface observables play the crucial role. As an application, we give the first construction of the full set of quantum integrals, i.e. commuting differential operators, such that the partition function of the so-called regular monodromy surface defect is their…
▽ More
We explore the $\textit{difference Langlands correspondence}$ using the four dimensional ${\mathcal{N}}=2$ super-QCD. Surface defects and surface observables play the crucial role. As an application, we give the first construction of the full set of quantum integrals, i.e. commuting differential operators, such that the partition function of the so-called regular monodromy surface defect is their joint eigenvectors in an evaluation module over the Yangian $Y(\mathfrak{gl}(2))$, making it the wavefunction of a $N$-site $\mathfrak{gl}(2)$ spin chain with bi-infinite spin modules. We construct the $\mathbf{Q}$- and $\tilde{\mathbf{Q}}$-surface observables which are believed to be the $Q$-operators on the bi-infinite module over the Yangian $Y(\mathfrak{gl}(2))$, and compute their eigenvalues, the $Q$-functions, as vevs of the surface observables.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Enumeration of multiplex juggling card sequences using generalized q-derivatives
Authors:
Yumin Cho,
Jaehyun Kim,
Jang Soo Kim,
Nakyung Lee
Abstract:
In 2019, Butler, Choi, Kim, and Seo introduced a new type of juggling card that represents multiplex juggling patterns in a natural bijective way. They conjectured a formula for the generating function for the number of multiplex juggling cards with capacity 2. In this paper we prove their conjecture. More generally, we find an explicit formula for the generating function with any capacity. We als…
▽ More
In 2019, Butler, Choi, Kim, and Seo introduced a new type of juggling card that represents multiplex juggling patterns in a natural bijective way. They conjectured a formula for the generating function for the number of multiplex juggling cards with capacity 2. In this paper we prove their conjecture. More generally, we find an explicit formula for the generating function with any capacity. We also find an expression for the generating function for multiplex juggling card sequences by introducing a generalization of the q-derivative operator. As a consequence, we show that this generating function is a rational function.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.