subscribe to arXiv mailings

Looking Inward: Language Models Can Learn About Themselves by Introspection

Authors: Felix J Binder, James Chua, Tomek Korbak, Henry Sleight, John Hughes, Robert Long, Ethan Perez, Miles Turpin, Owain Evans

Abstract: Humans acquire knowledge by observing the external world, but also by introspection. Introspection gives a person privileged access to their current state of mind (e.g., thoughts and feelings) that is not accessible to external observers. Can LLMs introspect? We define introspection as acquiring knowledge that is not contained in or derived from training data but instead originates from internal s… ▽ More Humans acquire knowledge by observing the external world, but also by introspection. Introspection gives a person privileged access to their current state of mind (e.g., thoughts and feelings) that is not accessible to external observers. Can LLMs introspect? We define introspection as acquiring knowledge that is not contained in or derived from training data but instead originates from internal states. Such a capability could enhance model interpretability. Instead of painstakingly analyzing a model's internal workings, we could simply ask the model about its beliefs, world models, and goals. More speculatively, an introspective model might self-report on whether it possesses certain internal states such as subjective feelings or desires and this could inform us about the moral status of these states. Such self-reports would not be entirely dictated by the model's training data. We study introspection by finetuning LLMs to predict properties of their own behavior in hypothetical scenarios. For example, "Given the input P, would your output favor the short- or long-term option?" If a model M1 can introspect, it should outperform a different model M2 in predicting M1's behavior even if M2 is trained on M1's ground-truth behavior. The idea is that M1 has privileged access to its own behavioral tendencies, and this enables it to predict itself better than M2 (even if M2 is generally stronger). In experiments with GPT-4, GPT-4o, and Llama-3 models (each finetuned to predict itself), we find that the model M1 outperforms M2 in predicting itself, providing evidence for introspection. Notably, M1 continues to predict its behavior accurately even after we intentionally modify its ground-truth behavior. However, while we successfully elicit introspection on simple tasks, we are unsuccessful on more complex tasks or those requiring out-of-distribution generalization. △ Less

Submitted 17 October, 2024; originally announced October 2024.

Comments: 15 pages, 9 figures

arXiv:2409.16857 [pdf, ps, other]

R$_{II}$ type three term relations for bivariate polynomials orthogonal with respect to varying weights

Authors: Cleonice F. Bracciali, Antonia M. Delgado, Lidia Fernández, Teresa E. Pérez

Abstract: Given a bivariate weight function defined on the positive quadrant of $\mathbb{R}^2$, we study polynomials in two variables orthogonal with respect to varying measures obtained by special modifications of this weight function. In particular, the varying weight functions are given by the multiplication of $x_1^{-n}x_2^{-n}$ times the original weight function. Apart from the question of the existenc… ▽ More Given a bivariate weight function defined on the positive quadrant of $\mathbb{R}^2$, we study polynomials in two variables orthogonal with respect to varying measures obtained by special modifications of this weight function. In particular, the varying weight functions are given by the multiplication of $x_1^{-n}x_2^{-n}$ times the original weight function. Apart from the question of the existence and construction of such kind of orthogonal polynomials, we show that the systems of bivariate polynomials orthogonal with respect to this kind of varying weights satisfy R$_{II}$ type three term relations, one for every variable. A method to construct bivariate orthogonal systems with respect to varying weights based in the Koornwinder's method is developed. Finally, several examples and particular cases have been analysed. △ Less

Submitted 25 September, 2024; originally announced September 2024.

MSC Class: 42C05; 33C47

arXiv:2409.12822 [pdf, other]

Language Models Learn to Mislead Humans via RLHF

Authors: Jiaxin Wen, Ruiqi Zhong, Akbir Khan, Ethan Perez, Jacob Steinhardt, Minlie Huang, Samuel R. Bowman, He He, Shi Feng

Abstract: Language models (LMs) can produce errors that are hard to detect for humans, especially when the task is complex. RLHF, the most popular post-training method, may exacerbate this problem: to achieve higher rewards, LMs might get better at convincing humans that they are right even when they are wrong. We study this phenomenon under a standard RLHF pipeline, calling it "U-SOPHISTRY" since it is Uni… ▽ More Language models (LMs) can produce errors that are hard to detect for humans, especially when the task is complex. RLHF, the most popular post-training method, may exacerbate this problem: to achieve higher rewards, LMs might get better at convincing humans that they are right even when they are wrong. We study this phenomenon under a standard RLHF pipeline, calling it "U-SOPHISTRY" since it is Unintended by model developers. Specifically, we ask time-constrained (e.g., 3-10 minutes) human subjects to evaluate the correctness of model outputs and calculate humans' accuracy against gold labels. On a question-answering task (QuALITY) and programming task (APPS), RLHF makes LMs better at convincing our subjects but not at completing the task correctly. RLHF also makes the model harder to evaluate: our subjects' false positive rate increases by 24.1% on QuALITY and 18.3% on APPS. Finally, we show that probing, a state-of-the-art approach for detecting Intended Sophistry (e.g. backdoored LMs), does not generalize to U-SOPHISTRY. Our results highlight an important failure mode of RLHF and call for more research in assisting humans to align them. △ Less

Submitted 24 September, 2024; v1 submitted 19 September, 2024; originally announced September 2024.

arXiv:2409.10857 [pdf, other]

doi 10.1063/5.0231929

Solving the Hele-Shaw flow using the Harrow-Hassidim-Lloyd algorithm on superconducting devices: A study of efficiency and challenges

Authors: Muralikrishnan Gopalakrishnan Meena, Kalyana C. Gottiparthi, Justin G. Lietz, Antigoni Georgiadou, Eduardo Antonio Coello Pérez

Abstract: The development of quantum processors capable of handling practical fluid flow problems represents a distant yet promising frontier. Recent strides in quantum algorithms, particularly linear solvers, have illuminated the path toward quantum solutions for classical fluid flow solvers. However, assessing the capability of these quantum linear systems algorithms (QLSAs) in solving ideal flow equation… ▽ More The development of quantum processors capable of handling practical fluid flow problems represents a distant yet promising frontier. Recent strides in quantum algorithms, particularly linear solvers, have illuminated the path toward quantum solutions for classical fluid flow solvers. However, assessing the capability of these quantum linear systems algorithms (QLSAs) in solving ideal flow equations on real hardware is crucial for their future development in practical fluid flow applications. In this study, we examine the capability of a canonical QLSA, the Harrow-Hassidim-Lloyd (HHL) algorithm, in accurately solving the system of linear equations governing an idealized fluid flow problem, specifically the Hele-Shaw flow. Our investigation focuses on analyzing the accuracy and computational cost of the HHL solver. To gauge the stability and convergence of the solver, we conduct shots-based simulations on quantum simulators. Furthermore, we share insights gained from executing the HHL solver on superconducting quantum devices. To mitigate errors arising from qubit measurement, gate operations, and qubit decoherence inherent in quantum devices, we employ various error suppression and mitigation techniques. Our preliminary assessments serve as a foundational step towards enabling more complex quantum utility scale evaluation of using QLSA for solving fluid flow problems. △ Less

Submitted 16 September, 2024; originally announced September 2024.

arXiv:2409.10140 [pdf]

doi 10.1088/1681-7575/aa7c47

Updated determination of the molar gas constant $R$ by acoustic measurements in argon at UVa-CEM

Authors: J J Segovia, D Lozano-Martín, M C Martín, C R Chamorro, M A Villamañán, E Pérez, C García Izquierdo, D del Campo

Abstract: A new determination of the molar gas constant was performed from measurements of the speed of sound in argon at the triple point of water and extrapolation to zero pressure. A new resonant cavity was used. This is a triaxial ellipsoid whose walls are gold-coated steel and which is divided into two identical halves that are bolted and sealed with an O-ring. Microwave and electroacoustic traducers a… ▽ More A new determination of the molar gas constant was performed from measurements of the speed of sound in argon at the triple point of water and extrapolation to zero pressure. A new resonant cavity was used. This is a triaxial ellipsoid whose walls are gold-coated steel and which is divided into two identical halves that are bolted and sealed with an O-ring. Microwave and electroacoustic traducers are located in the northern and southern parts of the cavity, respectively, so that measurements of microwave and acoustic frequencies are carried out in the same experiment. Measurements were taken at pressures from 600 kPa to 60 kPa and at 273.16 K. The internal equivalent radius of the cavity was accurately determined by microwave measurements and the first four radial symmetric acoustic modes were simultaneously measured and used to calculate the speed of sound. The improvements made using the new cavity have reduced by half the main contributions to the uncertainty due to the radius determination using microwave measurements which amounts to 4.7 parts in $10^{6}$ and the acoustic measurements, 4.4 parts in $10^{6}$, where the main contribution (3.7 parts in $10^{6}$) is the relative excess half-widths associated with the limit of our acoustic model, compared with our previous measurements. As a result of all the improvements with the new cavity and the measurements performed, we determined the molar gas constant $R$ = (8.314 449 $\pm$ 0.000 056) J/(K mol) which corresponds to a relative standard uncertainty of 6.7 parts in $10^{6}$. The value reported in this paper lies -1.3 parts in $10^{6}$ below the recommended value of CODATA 2014, although still within the range consistent with it. △ Less

Submitted 16 September, 2024; originally announced September 2024.

Journal ref: Metrologia 54, 2017, 663-673

arXiv:2408.16159 [pdf, other]

doi 10.1016/j.future.2024.06.058

Integrating Quantum Computing Resources into Scientific HPC Ecosystems

Authors: Thomas Beck, Alessandro Baroni, Ryan Bennink, Gilles Buchs, Eduardo Antonio Coello Perez, Markus Eisenbach, Rafael Ferreira da Silva, Muralikrishnan Gopalakrishnan Meena, Kalyan Gottiparthi, Peter Groszkowski, Travis S. Humble, Ryan Landfield, Ketan Maheshwari, Sarp Oral, Michael A. Sandoval, Amir Shehata, In-Saeng Suh, Christopher Zimmer

Abstract: Quantum Computing (QC) offers significant potential to enhance scientific discovery in fields such as quantum chemistry, optimization, and artificial intelligence. Yet QC faces challenges due to the noisy intermediate-scale quantum era's inherent external noise issues. This paper discusses the integration of QC as a computational accelerator within classical scientific high-performance computing (… ▽ More Quantum Computing (QC) offers significant potential to enhance scientific discovery in fields such as quantum chemistry, optimization, and artificial intelligence. Yet QC faces challenges due to the noisy intermediate-scale quantum era's inherent external noise issues. This paper discusses the integration of QC as a computational accelerator within classical scientific high-performance computing (HPC) systems. By leveraging a broad spectrum of simulators and hardware technologies, we propose a hardware-agnostic framework for augmenting classical HPC with QC capabilities. Drawing on the HPC expertise of the Oak Ridge National Laboratory (ORNL) and the HPC lifecycle management of the Department of Energy (DOE), our approach focuses on the strategic incorporation of QC capabilities and acceleration into existing scientific HPC workflows. This includes detailed analyses, benchmarks, and code optimization driven by the needs of the DOE and ORNL missions. Our comprehensive framework integrates hardware, software, workflows, and user interfaces to foster a synergistic environment for quantum and classical computing research. This paper outlines plans to unlock new computational possibilities, driving forward scientific inquiry and innovation in a wide array of research domains. △ Less

Submitted 28 August, 2024; originally announced August 2024.

arXiv:2408.16127 [pdf, ps, other]

On generic bricks over tame algebras

Authors: R. Bautista, E. Pérez, L. Salmerón

Abstract: We prove that if $Λ$ is a tame finite-dimensional algebra over an algebraically closed field and $G$ is a generic $Λ-$module, then $G$ is a generic brick if and only if it determines a one-parameter family of bricks with the same dimension. In particular, we obtain that $Λ$ admits a generic brick if and only if $Λ$ is brick-continuous. We prove that if $Λ$ is a tame finite-dimensional algebra over an algebraically closed field and $G$ is a generic $Λ-$module, then $G$ is a generic brick if and only if it determines a one-parameter family of bricks with the same dimension. In particular, we obtain that $Λ$ admits a generic brick if and only if $Λ$ is brick-continuous. △ Less

Submitted 28 August, 2024; originally announced August 2024.

MSC Class: 16E45; 16E30; 16W50; 18E30; 18E10

arXiv:2407.17632 [pdf, ps, other]

The low dimensional homology groups of the elementary group of rank two

Authors: Behrooz Mirzaii, Elvis Torres Pérez

Abstract: In this article we study the first, the second and the third homology groups of the elementary group $\textrm{E}_2(A)$, where $A$ is a commutative ring. In particular, we prove a refined Bloch-Wigner type exact sequence over a semilocal ring (with some mild restriction on its residue fields) such that $-1\in (A^{\times})^2$ or $|A^{\times}/(A^{\times})^2|\leq 4$. In this article we study the first, the second and the third homology groups of the elementary group $\textrm{E}_2(A)$, where $A$ is a commutative ring. In particular, we prove a refined Bloch-Wigner type exact sequence over a semilocal ring (with some mild restriction on its residue fields) such that $-1\in (A^{\times})^2$ or $|A^{\times}/(A^{\times})^2|\leq 4$. △ Less

Submitted 24 July, 2024; originally announced July 2024.

MSC Class: 19D99

arXiv:2407.15549 [pdf, other]

Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs

Authors: Abhay Sheshadri, Aidan Ewart, Phillip Guo, Aengus Lynch, Cindy Wu, Vivek Hebbar, Henry Sleight, Asa Cooper Stickland, Ethan Perez, Dylan Hadfield-Menell, Stephen Casper

Abstract: Large language models (LLMs) can often be made to behave in undesirable ways that they are explicitly fine-tuned not to. For example, the LLM red-teaming literature has produced a wide variety of 'jailbreaking' techniques to elicit harmful text from models that were fine-tuned to be harmless. Recent work on red-teaming, model editing, and interpretability suggests that this challenge stems from ho… ▽ More Large language models (LLMs) can often be made to behave in undesirable ways that they are explicitly fine-tuned not to. For example, the LLM red-teaming literature has produced a wide variety of 'jailbreaking' techniques to elicit harmful text from models that were fine-tuned to be harmless. Recent work on red-teaming, model editing, and interpretability suggests that this challenge stems from how (adversarial) fine-tuning largely serves to suppress rather than remove undesirable capabilities from LLMs. Prior work has introduced latent adversarial training (LAT) as a way to improve robustness to broad classes of failures. These prior works have considered untargeted latent space attacks where the adversary perturbs latent activations to maximize loss on examples of desirable behavior. Untargeted LAT can provide a generic type of robustness but does not leverage information about specific failure modes. Here, we experiment with targeted LAT where the adversary seeks to minimize loss on a specific competing task. We find that it can augment a wide variety of state-of-the-art methods. First, we use targeted LAT to improve robustness to jailbreaks, outperforming a strong R2D2 baseline with orders of magnitude less compute. Second, we use it to more effectively remove backdoors with no knowledge of the trigger. Finally, we use it to more effectively unlearn knowledge for specific undesirable tasks in a way that is also more robust to re-learning. Overall, our results suggest that targeted LAT can be an effective tool for defending against harmful behaviors from LLMs. △ Less

Submitted 21 August, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

arXiv:2407.15211 [pdf, other]

When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?

Authors: Rylan Schaeffer, Dan Valentine, Luke Bailey, James Chua, Cristóbal Eyzaguirre, Zane Durante, Joe Benton, Brando Miranda, Henry Sleight, John Hughes, Rajashree Agrawal, Mrinank Sharma, Scott Emmons, Sanmi Koyejo, Ethan Perez

Abstract: The integration of new modalities into frontier AI systems offers exciting capabilities, but also increases the possibility such systems can be adversarially manipulated in undesirable ways. In this work, we focus on a popular class of vision-language models (VLMs) that generate text outputs conditioned on visual and textual inputs. We conducted a large-scale empirical study to assess the transfer… ▽ More The integration of new modalities into frontier AI systems offers exciting capabilities, but also increases the possibility such systems can be adversarially manipulated in undesirable ways. In this work, we focus on a popular class of vision-language models (VLMs) that generate text outputs conditioned on visual and textual inputs. We conducted a large-scale empirical study to assess the transferability of gradient-based universal image "jailbreaks" using a diverse set of over 40 open-parameter VLMs, including 18 new VLMs that we publicly release. Overall, we find that transferable gradient-based image jailbreaks are extremely difficult to obtain. When an image jailbreak is optimized against a single VLM or against an ensemble of VLMs, the jailbreak successfully jailbreaks the attacked VLM(s), but exhibits little-to-no transfer to any other VLMs; transfer is not affected by whether the attacked and target VLMs possess matching vision backbones or language models, whether the language model underwent instruction-following and/or safety-alignment training, or many other factors. Only two settings display partially successful transfer: between identically-pretrained and identically-initialized VLMs with slightly different VLM training data, and between different training checkpoints of a single VLM. Leveraging these results, we then demonstrate that transfer can be significantly improved against a specific target VLM by attacking larger ensembles of "highly-similar" VLMs. These results stand in stark contrast to existing evidence of universal and transferable text jailbreaks against language models and transferable adversarial attacks against image classifiers, suggesting that VLMs may be more robust to gradient-based transfer attacks. △ Less

Submitted 21 July, 2024; originally announced July 2024.

arXiv:2407.07444 [pdf, other]

EDHOC is a New Security Handshake Standard: An Overview of Security Analysis

Authors: Elsa López Pérez, Inria Göran Selander, John Preuß Mattsson, Thomas Watteyne, Mališa Vučinić

Abstract: The paper wraps up the call for formal analysis of the new security handshake protocol EDHOC by providing an overview of the protocol as it was standardized, a summary of the formal security analyses conducted by the community, and a discussion on open venues for future work. The paper wraps up the call for formal analysis of the new security handshake protocol EDHOC by providing an overview of the protocol as it was standardized, a summary of the formal security analyses conducted by the community, and a discussion on open venues for future work. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Journal ref: IEEE Computer Society, 2024

arXiv:2406.18502 [pdf, other]

Studying single-electron traps in newly fabricated Skipper-CCDs for the Oscura experiment using the pocket-pumping technique

Authors: S. E. Perez, B. A. Cervantes-Vergara, J. Estrada, S. Holland, D. Rodrigues, J. Tiffenberg

Abstract: Understanding and characterizing very low-energy ($\sim$eV) background sources is a must in rare-event searches. Oscura, an experiment aiming to probe electron recoils from sub-GeV dark matter using a 10-kg skipper-CCD detector, has recently fabricated its first two batches of sensors. In this work, we present the characterization of defects/contaminants identified in the buried-channel region of… ▽ More Understanding and characterizing very low-energy ($\sim$eV) background sources is a must in rare-event searches. Oscura, an experiment aiming to probe electron recoils from sub-GeV dark matter using a 10-kg skipper-CCD detector, has recently fabricated its first two batches of sensors. In this work, we present the characterization of defects/contaminants identified in the buried-channel region of these newly fabricated skipper-CCDs. These defects/contaminants produce deferred charge from trap emission in the images next to particle tracks, which can be spatially resolved due to the sub-electron resolution achieved with these sensors. Using the trap-pumping technique, we measured the energy and cross section associated to these traps in three Oscura prototype sensors from different fabrication batches which underwent different gettering methods during fabrication. Results suggest that the type of defects/contaminants is more closely linked to the fabrication batch rather than to the gettering method used. The exposure-dependent single-electron rate (SER) of one of these sensors was measured $\sim$100~m underground, yielding $(1.8\pm 0.3)\times10^{-3}~e^-$/pix/day at 131K. The impact of the identified traps on the measured exposure-dependent SER is evaluated via a Monte Carlo simulation. Results suggest that the exposure-dependent SER of Oscura prototype sensors would be lower in lower background environments as expected. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.14704 [pdf, other]

Single-photon gig in Betelgeuse's occultation

Authors: F. Prada, R. Gomez-Merchan, E. Pérez, J. E. Betancort-Rijo, J. A. Leñero-Bardallo, Á. Rodríguez-Vázquez, G. Glez-de-Rivera, S. Díaz-López, J. de Elias Cantalapiedra

Abstract: We present results from the occultation of Betelgeuse by asteroid (319) Leona on December 12, 2023, observed using a 64x64 pixel Single-Photon Avalanche Diode (SPAD) array mounted on a 10-inch telescope at the AstroCamp Observatory in Nerpio, Southeast of Spain, just a few kilometers from the center of the occultation shadow path. This study highlights remarkable advancements in applying SPAD tech… ▽ More We present results from the occultation of Betelgeuse by asteroid (319) Leona on December 12, 2023, observed using a 64x64 pixel Single-Photon Avalanche Diode (SPAD) array mounted on a 10-inch telescope at the AstroCamp Observatory in Nerpio, Southeast of Spain, just a few kilometers from the center of the occultation shadow path. This study highlights remarkable advancements in applying SPAD technology in astronomy. The SPAD array's asynchronous readout capacity and photon-counting timestamp mode enabled a temporal resolution of 1 microsecond in our light curve observations of Betelgeuse. Our data analysis addressed challenges inherent to SPAD arrays, such as optical cross-talk and afterpulses, which typically cause the photon statistics to deviate from a Poisson distribution. By adopting a generalized negative binomial distribution for photon statistics, we accurately describe the observational data. This method yielded an optical cross-talk estimation of 1.07% in our SPAD array and confirmed a negligible impact of spurious detected events due to afterpulses. The meticulous statistical examination of photon data underscores our SPAD-array's exceptional performance in conducting precise astronomical observations. The observations revealed a major decrease in Betelgeuse's intensity by 77.78% at the occultation's peak, allowing us to measure Betelgeuse's angular diameter at 57.26 mas in the SDSS g-band. This measurement, employing a simplified occultation model and considering the known properties of Leona, demonstrates the potential of SPAD technology for astronomy and sets a new standard for observing ultra-rapid transient celestial events, providing a valuable public dataset for the astronomical community. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 7 pages, 4 figures

arXiv:2406.10162 [pdf, other]

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

Authors: Carson Denison, Monte MacDiarmid, Fazl Barez, David Duvenaud, Shauna Kravec, Samuel Marks, Nicholas Schiefer, Ryan Soklaski, Alex Tamkin, Jared Kaplan, Buck Shlegeris, Samuel R. Bowman, Ethan Perez, Evan Hubinger

Abstract: In reinforcement learning, specification gaming occurs when AI systems learn undesired behaviors that are highly rewarded due to misspecified training goals. Specification gaming can range from simple behaviors like sycophancy to sophisticated and pernicious behaviors like reward-tampering, where a model directly modifies its own reward mechanism. However, these more pernicious behaviors may be to… ▽ More In reinforcement learning, specification gaming occurs when AI systems learn undesired behaviors that are highly rewarded due to misspecified training goals. Specification gaming can range from simple behaviors like sycophancy to sophisticated and pernicious behaviors like reward-tampering, where a model directly modifies its own reward mechanism. However, these more pernicious behaviors may be too complex to be discovered via exploration. In this paper, we study whether Large Language Model (LLM) assistants which find easily discovered forms of specification gaming will generalize to perform rarer and more blatant forms, up to and including reward-tampering. We construct a curriculum of increasingly sophisticated gameable environments and find that training on early-curriculum environments leads to more specification gaming on remaining environments. Strikingly, a small but non-negligible proportion of the time, LLM assistants trained on the full curriculum generalize zero-shot to directly rewriting their own reward function. Retraining an LLM not to game early-curriculum environments mitigates, but does not eliminate, reward-tampering in later environments. Moreover, adding harmlessness training to our gameable environments does not prevent reward-tampering. These results demonstrate that LLMs can generalize from common forms of specification gaming to more pernicious reward tampering and that such behavior may be nontrivial to remove. △ Less

Submitted 28 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

Comments: Make it easier to find samples from the model, and highlight that our operational definition of reward tampering has false positives where the model attempts to complete the task honestly but edits the reward. Add paragraph to conclusion to this effect, and add sentence to figure 1 to this effect

arXiv:2406.05102 [pdf, other]

Timing-based mass measurement of exotic long-lived particles at the FCC-ee

Authors: Roy Aleksan, Emmanuel Perez, Giacomo Polesello, Nicolò Valle

Abstract: The very high luminosity run foreseen at the $Z$-pole for the FCC-ee will allow the detection in $Z$ decays of new particles with very low couplings to the Standard Model. These particles will have measurable flight paths before they decay. If the timing and the position of the decay vertex can be measured with high precision, the mass of such particles can be measured by exploiting the constraine… ▽ More The very high luminosity run foreseen at the $Z$-pole for the FCC-ee will allow the detection in $Z$ decays of new particles with very low couplings to the Standard Model. These particles will have measurable flight paths before they decay. If the timing and the position of the decay vertex can be measured with high precision, the mass of such particles can be measured by exploiting the constrained kinematics of an $e^+e^-$ collider. The mass resolution achievable with this technique is studied through a detailed analysis in the framework of a parametrised simulation of the performance of the IDEA detector. The adopted benchmark model is the production of Heavy Neutral Leptons, which is one of the key channels for new physics discovery at the FCC-ee. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 16 pages, 11 figures, 1 table

arXiv:2406.03895 [pdf, ps, other]

Lattice Lipschitz superposition operators on Banach function spaces

Authors: Roger Arnau, Jose M. Calabuig, Ezgi Erdoğan, Enrique A. Sánchez Pérez

Abstract: We analyse and characterise the notion of lattice Lipschitz operator (a class of superposition operators, diagonal Lipschitz maps) when defined between Banach function spaces. After showing some general results, we restrict our attention to the case of those Lipschitz operators which are representable by pointwise composition with a strongly measurable function. Mimicking the classical definition… ▽ More We analyse and characterise the notion of lattice Lipschitz operator (a class of superposition operators, diagonal Lipschitz maps) when defined between Banach function spaces. After showing some general results, we restrict our attention to the case of those Lipschitz operators which are representable by pointwise composition with a strongly measurable function. Mimicking the classical definition and characterizations of (linear) multiplication operators between Banach function spaces, we show that under certain conditions the requirement for a diagonal Lipschitz operator to be well-defined between two such spaces $X(μ)$ and $Y(μ)$ is that it can be represented by a strongly measurable function which belongs to the Bochner space $\mathcal M(X,Y) \big(μ, Lip_0(\mathbb R) \big). $ Here, $\mathcal M(X,Y) $ is the space of multiplication operators between $X(μ)$ and $Y(μ),$ and $Lip_0(\mathbb R)$ is the space of real-valued Lipschitz maps with real variable that are equal to $0$ in $0. $ This opens the door to a better understanding of these maps, as well as finding the relation of these operators to some normed tensor products and other classes of maps. △ Less

Submitted 6 June, 2024; originally announced June 2024.

MSC Class: 47J10; 46E30; 26A16

arXiv:2405.08950 [pdf, ps, other]

The low dimensional homology of projective linear group of rank two

Authors: Behrooz Mirzaii, Elvis Torres Pérez

Abstract: In this article we study the low dimensional homology of the projective linear group $\textrm{PGL}_2(A)$ over a commutative ring $A$. In particular, we prove a Bloch-Wigner type exact sequence over local domains. As applications we prove that $H_2(\textrm{PGL}_2(A),\mathbb{Z}\left[\frac{1}{2}\right])\simeq K_2(A)\left[\frac{1}{2}\right]$ and… ▽ More In this article we study the low dimensional homology of the projective linear group $\textrm{PGL}_2(A)$ over a commutative ring $A$. In particular, we prove a Bloch-Wigner type exact sequence over local domains. As applications we prove that $H_2(\textrm{PGL}_2(A),\mathbb{Z}\left[\frac{1}{2}\right])\simeq K_2(A)\left[\frac{1}{2}\right]$ and $H_3(\textrm{PGL}_2(A),\mathbb{Z}\left[\frac{1}{2}\right])\simeq K_3^{\textrm{ind}}(A)\left[\frac{1}{2}\right]$ provided $|A/\mathcal{m}_A|\neq 2,3,4,8$. △ Less

Submitted 12 October, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.01220 [pdf, other]

doi 10.1109/ACCESS.2024.3471430

Misspecification of Multiple Scattering in Scalar Wave Fields and its Impact in Ultrasound Tomography

Authors: Eduardo Pérez, Sebastian Semper, Sayako Kodera, Florian Römer, Giovanni Del Galdo

Abstract: In this work, we investigate the localization of targets in the presence of multiple scattering. We focus on the often omitted scenario in which measurement data is affected by multiple scattering, and a simpler model is employed in the estimation. We study the impact of such model mismatch by means of the Misspecified Cramér-Rao Bound (MCRB). In numerical simulations inspired by tomographic inspe… ▽ More In this work, we investigate the localization of targets in the presence of multiple scattering. We focus on the often omitted scenario in which measurement data is affected by multiple scattering, and a simpler model is employed in the estimation. We study the impact of such model mismatch by means of the Misspecified Cramér-Rao Bound (MCRB). In numerical simulations inspired by tomographic inspection in ultrasound nondestructive testing, the MCRB is shown to correctly describe the estimation variance of localization parameters under misspecification of the wave propagation model. We provide extensive discussion on the utility of the MCRB in the practical task of verifying whether a chosen misspecified model is suitable for localization based on the properties of the maximum likelihood estimator and the nuanced distinction between bias and parameter space differences. Finally, we highlight that careful interpretation is needed whenever employing the classical CRB in the presence of mismatch through numerical examples based on the Born approximation and other simplified propagation models stemming from it. △ Less

Submitted 1 October, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

Comments: 17 pages, 7 figures

arXiv:2404.04558 [pdf, ps, other]

EVT-enriched Radio Maps for URLLC

Authors: Dian Echevarría Pérez, Onel L. Alcaraz López, Hirley Alves

Abstract: This paper introduces a sophisticated and adaptable framework combining extreme value theory with radio maps to spatially model extreme channel conditions accurately. Utilising existing signal-to-noise ratio (SNR) measurements and leveraging Gaussian processes, our approach predicts the tail of the SNR distribution, which entails estimating the parameters of a generalised Pareto distribution, at u… ▽ More This paper introduces a sophisticated and adaptable framework combining extreme value theory with radio maps to spatially model extreme channel conditions accurately. Utilising existing signal-to-noise ratio (SNR) measurements and leveraging Gaussian processes, our approach predicts the tail of the SNR distribution, which entails estimating the parameters of a generalised Pareto distribution, at unobserved locations. This innovative method offers a versatile solution adaptable to various resource allocation challenges in ultra-reliable low-latency communications. We evaluate the performance of this method in a rate maximisation problem with defined outage constraints and compare it with a benchmark in the literature. Notably, the proposed approach meets the outage demands in a larger percentage of the coverage area and reaches higher transmission rates. △ Less

Submitted 6 April, 2024; originally announced April 2024.

Comments: 8 pages, 11 figures, submitted to IEEE Transactions on Wireless Communications

arXiv:2404.03449 [pdf, other]

Integrating AI in NDE: Techniques, Trends, and Further Directions

Authors: Eduardo Pérez, Cemil Emre Ardic, Ozan Çakıroğlu, Kevin Jacob, Sayako Kodera, Luca Pompa, Mohamad Rachid, Han Wang, Yiming Zhou, Cyril Zimmer, Florian Römer, Ahmad Osman

Abstract: The digital transformation is fundamentally changing our industries, affecting planning, execution as well as monitoring of production processes in a wide range of application fields. With product line-ups becoming more and more versatile and diverse, the necessary inspection and monitoring sparks significant novel requirements on the corresponding Nondestructive Evaluation (NDE) systems. The esta… ▽ More The digital transformation is fundamentally changing our industries, affecting planning, execution as well as monitoring of production processes in a wide range of application fields. With product line-ups becoming more and more versatile and diverse, the necessary inspection and monitoring sparks significant novel requirements on the corresponding Nondestructive Evaluation (NDE) systems. The establishment of increasingly powerful approaches to incorporate Artificial Intelligence (AI) may provide just the needed innovation to solve some of these challenges. In this paper we provide a comprehensive survey about the usage of AI methods in NDE in light of the recent innovations towards NDE 4.0. Since we cannot discuss each NDE modality in one paper, we limit our attention to magnetic methods, ultrasound, thermography, as well as optical inspection. In addition to reviewing recent AI developments in each field, we draw common connections by pointing out NDE-related tasks that have a common underlying mathematical problem and categorizing the state of the art according to the corresponding sub-tasks. In so doing, interdisciplinary connections are drawn that provide a more complete overall picture. △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2403.10595 [pdf, other]

A survey for variable stars with small telescopes: IX -- Evolution of Spot Properties on YSOs in IC5070

Authors: Carys Herbert, Dirk Froebrich, Siegfried Vanaverbeke, Aleks Scholz, Jochen Eislöffel, Thomas Urtly, Ivan L. Walton, Klaas Wiersema, Nick J. Quinn, Georg Piehler, Mario Morales Aimar, Rafael Castillo García, Tonny Vanmunster, Francisco C. Soldán Alfaro, Faustino García de la Cuesta, Domenico Licchelli, Alex Escartin Perez, Esteban Fernández Mañanes, Noelia Graciá Ribes, José Luis Salto González, Stephen R. L. Futcher, Tim Nelson, Shawn Dvorak, Dawid Moździerski, Krzysztof Kotysz , et al. (23 additional authors not shown)

Abstract: We present spot properties on 32 periodic young stellar objects in IC 5070. Long term, $\sim$5 yr, light curves in the $V$, $R$, and $I$-bands are obtained through the HOYS (Hunting Outbursting Young Stars) citizen science project. These are dissected into six months long slices, with 3 months oversampling, to measure 234 sets of amplitudes in all filters. We fit 180 of these with reliable spot so… ▽ More We present spot properties on 32 periodic young stellar objects in IC 5070. Long term, $\sim$5 yr, light curves in the $V$, $R$, and $I$-bands are obtained through the HOYS (Hunting Outbursting Young Stars) citizen science project. These are dissected into six months long slices, with 3 months oversampling, to measure 234 sets of amplitudes in all filters. We fit 180 of these with reliable spot solutions. Two thirds of spot solutions are cold spots, the lowest is 2150 K below the stellar temperature. One third are warm spots that are above the stellar temperature by less than $\sim$2000 K. Cold and warm spots have maximum surface coverage values of 40 percent, although only 16 percent of warm spots are above 20 percent surface coverage as opposed to 60 percent of the cold spots. Warm spots are most likely caused by a combination of plages and low density accretion columns, most common on objects without inner disc excess emission in $K-W2$. Five small hot spot solutions have $<3$ percent coverage and are 3000 - 5000 K above the stellar temperature. These are attributed to accretion, and four of them occur on the same object. The majority of our objects are likely to be accreting. However, we observe very few accretion hot spots as either the accretion is not stable on our timescale or the photometry is dominated by other features. We do not identify cyclical spot behaviour on the targets. We additionally identify and discuss a number of objects that have interesting amplitudes, phase changes, or spot properties. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: Accepted for publication by MNRAS. 17 + 7 pages, 7 + 23 figures, 1 table

arXiv:2403.10134 [pdf, other]

doi 10.1140/epjc/s10052-024-12987-0

Measurement of groomed event shape observables in deep-inelastic electron-proton scattering at HERA

Authors: The H1 collaboration, V. Andreev, M. Arratia, A. Baghdasaryan, A. Baty, K. Begzsuren, A. Bolz, V. Boudry, G. Brandt, D. Britzger, A. Buniatyan, L. Bystritskaya, A. J. Campbell, K. B. Cantun Avila, K. Cerny, V. Chekelian, Z. Chen, J. G. Contreras, J. Cvach, J. B. Dainton, K. Daum, A. Deshpande, C. Diaconu, A. Drees, G. Eckerlin , et al. (123 additional authors not shown)

Abstract: The H1 Collaboration at HERA reports the first measurement of groomed event shape observables in deep inelastic electron-proton scattering (DIS) at $\sqrt{s}=319$ GeV, using data recorded between the years 2003 and 2007 with an integrated luminosity of $351$ pb$^{-1}$. Event shapes provide incisive probes of perturbative and non-perturbative QCD. Grooming techniques have been used for jet measurem… ▽ More The H1 Collaboration at HERA reports the first measurement of groomed event shape observables in deep inelastic electron-proton scattering (DIS) at $\sqrt{s}=319$ GeV, using data recorded between the years 2003 and 2007 with an integrated luminosity of $351$ pb$^{-1}$. Event shapes provide incisive probes of perturbative and non-perturbative QCD. Grooming techniques have been used for jet measurements in hadronic collisions; this paper presents the first application of grooming to DIS data. The analysis is carried out in the Breit frame, utilizing the novel Centauro jet clustering algorithm that is designed for DIS event topologies. Events are required to have squared momentum-transfer $Q^2 > 150$ GeV$^2$ and inelasticity $ 0.2 < y < 0.7$. We report measurements of the production cross section of groomed event 1-jettiness and groomed invariant mass for several choices of grooming parameter. Monte Carlo model calculations and analytic calculations based on Soft Collinear Effective Theory are compared to the measurements. △ Less

Submitted 1 August, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: 32 pages, 17 tables, 7 figures, version as accepted by EPJ C

Report number: DESY-24-036

Journal ref: EPJC 84 (2024), 718

arXiv:2403.10109 [pdf, other]

Measurement of the 1-jettiness event shape observable in deep-inelastic electron-proton scattering at HERA

Authors: The H1 collaboration, V. Andreev, M. Arratia, A. Baghdasaryan, A. Baty, K. Begzsuren, A. Bolz, V. Boudry, G. Brandt, D. Britzger, A. Buniatyan, L. Bystritskaya, A. J. Campbell, K. B. Cantun Avila, K. Cerny, V. Chekelian, Z. Chen, J. G. Contreras, J. Cvach, J. B. Dainton, K. Daum, A. Deshpande, C. Diaconu, A. Drees, G. Eckerlin , et al. (124 additional authors not shown)

Abstract: The H1 Collaboration reports the first measurement of the 1-jettiness event shape observable $τ_1^b$ in neutral-current deep-inelastic electron-proton scattering (DIS). The observable $τ_1^b$ is equivalent to a thrust observable defined in the Breit frame. The data sample was collected at the HERA $ep$ collider in the years 2003-2007 with center-of-mass energy of $\sqrt{s}=319\,\text{GeV}$, corres… ▽ More The H1 Collaboration reports the first measurement of the 1-jettiness event shape observable $τ_1^b$ in neutral-current deep-inelastic electron-proton scattering (DIS). The observable $τ_1^b$ is equivalent to a thrust observable defined in the Breit frame. The data sample was collected at the HERA $ep$ collider in the years 2003-2007 with center-of-mass energy of $\sqrt{s}=319\,\text{GeV}$, corresponding to an integrated luminosity of $351.1\,\text{pb}^{-1}$. Triple differential cross sections are provided as a function of $τ_1^b$, event virtuality $Q^2$, and inelasticity $y$, in the kinematic region $Q^2>150\,\text{GeV}^{2}$. Single differential cross section are provided as a function of $τ_1^b$ in a limited kinematic range. Double differential cross sections are measured, in contrast, integrated over $τ_1^b$ and represent the inclusive neutral-current DIS cross section measured as a function of $Q^2$ and $y$. The data are compared to a variety of predictions and include classical and modern Monte Carlo event generators, predictions in fixed-order perturbative QCD where calculations up to $\mathcal{O}(α_s^3)$ are available for $τ_1^b$ or inclusive DIS, and resummed predictions at next-to-leading logarithmic accuracy matched to fixed order predictions at $\mathcal{O}(α_s^2)$. These comparisons reveal sensitivity of the 1-jettiness observable to QCD parton shower and resummation effects, as well as the modeling of hadronization and fragmentation. Within their range of validity, the fixed-order predictions provide a good description of the data. Monte Carlo event generators are predictive over the full measured range and hence their underlying models and parameters can be constrained by comparing to the presented data. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: 45 pages, 38 tables, 13 figures

Report number: DESY-24-035

arXiv:2403.09505 [pdf, other]

Efficient Convolutional Forward Modeling and Sparse Coding in Multichannel Imaging

Authors: Han Wang, Yhonatan Kvich, Eduardo Pérez, Florian Römer, Yonina C. Eldar

Abstract: This study considers the Block-Toeplitz structural properties inherent in traditional multichannel forward model matrices, using Full Matrix Capture (FMC) in ultrasonic testing as a case study. We propose an analytical convolutional forward model that transforms reflectivity maps into FMC data. Our findings demonstrate that the convolutional model excels over its matrix-based counterpart in terms… ▽ More This study considers the Block-Toeplitz structural properties inherent in traditional multichannel forward model matrices, using Full Matrix Capture (FMC) in ultrasonic testing as a case study. We propose an analytical convolutional forward model that transforms reflectivity maps into FMC data. Our findings demonstrate that the convolutional model excels over its matrix-based counterpart in terms of computational efficiency and storage requirements. This accelerated forward modeling approach holds significant potential for various inverse problems, notably enhancing Sparse Signal Recovery (SSR) within the context LASSO regression, which facilitates efficient Convolutional Sparse Coding (CSC) algorithms. Additionally, we explore the integration of Convolutional Neural Networks (CNNs) for the forward model, employing deep unfolding to implement the Learned Block Convolutional ISTA (BC-LISTA). △ Less

Submitted 18 July, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

Comments: 5 pages, 7 figures, accepted by EUSIPCO-2024

arXiv:2403.08982 [pdf, other]

doi 10.1140/epjc/s10052-024-13003-1

Observation and differential cross section measurement of neutral current DIS events with an empty hemisphere in the Breit frame

Authors: The H1 collaboration, V. Andreev, M. Arratia, A. Baghdasaryan, A. Baty, K. Begzsuren, A. Bolz, V. Boudry, G. Brandt, D. Britzger, A. Buniatyan, L. Bystritskaya, A. J. Campbell, K. B. Cantun Avila, K. Cerny, V. Chekelian, Z. Chen, J. G. Contreras, J. Cvach, J. B. Dainton, K. Daum, A. Deshpande, C. Diaconu, A. Drees, G. Eckerlin , et al. (124 additional authors not shown)

Abstract: The Breit frame provides a natural frame to analyze lepton-proton scattering events. In this reference frame, the parton model hard interactions between a quark and an exchanged boson defines the coordinate system such that the struck quark is back-scattered along the virtual photon momentum direction. In Quantum Chromodynamics (QCD), higher order perturbative or non-perturbative effects can chang… ▽ More The Breit frame provides a natural frame to analyze lepton-proton scattering events. In this reference frame, the parton model hard interactions between a quark and an exchanged boson defines the coordinate system such that the struck quark is back-scattered along the virtual photon momentum direction. In Quantum Chromodynamics (QCD), higher order perturbative or non-perturbative effects can change this picture drastically. As Bjorken-$x$ decreases below one half, a rather peculiar event signature is predicted with increasing probability, where no radiation is present in one of the two Breit-frame hemispheres and all emissions are to be found in the other hemisphere. At higher orders in $α_s$ or in the presence of soft QCD effects, predictions of the rate of these events are far from trivial, and that motivates measurements with real data. We report on the first observation of the empty current hemisphere events in electron-proton collisions at the HERA collider using data recorded with the H1 detector at a center-of-mass energy of 319 GeV. The fraction of inclusive neutral-current DIS events with an empty hemisphere is found to be $0.0112 \pm 3.9\,\%_\text{stat} \pm 4.5\,\%_\text{syst} \pm 1.6\,\%_\text{mod}$ in the selected kinematic region of $150< Q^2<1500$ GeV$^2$ and inelasticity $0.14< y<0.7$. The data sample corresponds to an integrated luminosity of 351.1 pb$^{-1}$, sufficient to enable differential cross section measurements of these events. The results show an enhanced discriminating power at lower Bjorken-$x$ among different Monte Carlo event generator predictions. △ Less

Submitted 1 August, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

Comments: 13 pages, 5 figures, 2 Tables. This version as accepted for publication

Report number: DESY-24-034

Journal ref: EPJC 84 (2024), 720

arXiv:2403.05518 [pdf, other]

Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought

Authors: James Chua, Edward Rees, Hunar Batra, Samuel R. Bowman, Julian Michael, Ethan Perez, Miles Turpin

Abstract: While chain-of-thought prompting (CoT) has the potential to improve the explainability of language model reasoning, it can systematically misrepresent the factors influencing models' behavior--for example, rationalizing answers in line with a user's opinion without mentioning this bias. To mitigate this biased reasoning problem, we introduce bias-augmented consistency training (BCT), an unsupervis… ▽ More While chain-of-thought prompting (CoT) has the potential to improve the explainability of language model reasoning, it can systematically misrepresent the factors influencing models' behavior--for example, rationalizing answers in line with a user's opinion without mentioning this bias. To mitigate this biased reasoning problem, we introduce bias-augmented consistency training (BCT), an unsupervised fine-tuning scheme that trains models to give consistent reasoning across prompts with and without biasing features. We construct a suite testing nine forms of biased reasoning on seven question-answering tasks, and find that applying BCT to GPT-3.5-Turbo with one bias reduces the rate of biased reasoning by 86% on held-out tasks. Moreover, this model generalizes to other forms of bias, reducing biased reasoning on held-out biases by an average of 37%. As BCT generalizes to held-out biases and does not require gold labels, this method may hold promise for reducing biased reasoning from as-of-yet unknown biases and on tasks where supervision for ground truth reasoning is unavailable. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2402.19023 [pdf, ps, other]

Jointly Learning Selection Matrices For Transmitters, Receivers And Fourier Coefficients In Multichannel Imaging

Authors: Han Wang, Yiming Zhou, Eduardo Perez, Florian Roemer

Abstract: Strategic subsampling has become a focal point due to its effectiveness in compressing data, particularly in the Full Matrix Capture (FMC) approach in ultrasonic imaging. This paper introduces the Joint Deep Probabilistic Subsampling (J-DPS) method, which aims to learn optimal selection matrices simultaneously for transmitters, receivers, and Fourier coefficients. This task-based algorithm is real… ▽ More Strategic subsampling has become a focal point due to its effectiveness in compressing data, particularly in the Full Matrix Capture (FMC) approach in ultrasonic imaging. This paper introduces the Joint Deep Probabilistic Subsampling (J-DPS) method, which aims to learn optimal selection matrices simultaneously for transmitters, receivers, and Fourier coefficients. This task-based algorithm is realized by introducing a specialized measurement model and integrating a customized Complex Learned FISTA (CL-FISTA) network. We propose a parallel network architecture, partitioned into three segments corresponding to the three matrices, all working toward a shared optimization objective with adjustable loss allocation. A synthetic dataset is designed to reflect practical scenarios, and we provide quantitative comparisons with a traditional CRB-based algorithm, standard DPS, and J-DPS. △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.13486 [pdf, other]

The 10 antipodal pairings of strongly involutive polyhedra

Authors: Javier Bracho, Eric Paulí Pérez, Luis Montejano, Jorge Luis Ramírez-Alfonsín

Abstract: It is known that strongly involutive polyhedra are closely related to self-dual maps where the antipodal function acts as duality isomorphism. Such a family of polyhedra appears in different combinatorial, topological and geometric contexts, and is thus attractive to be studied. In this note, we determine the 10 antipodal pairings among the classification of the 24 self-dual pairings… ▽ More It is known that strongly involutive polyhedra are closely related to self-dual maps where the antipodal function acts as duality isomorphism. Such a family of polyhedra appears in different combinatorial, topological and geometric contexts, and is thus attractive to be studied. In this note, we determine the 10 antipodal pairings among the classification of the 24 self-dual pairings $Dual(G)\rhd Aut(G)$ of self-dual maps G. We also present the orbifold associated to each antipodal pairing and describe explicitly the corresponding fundamental regions. We finally explain how to construct two infinite families of strongly involutive polyhedra (one of them new) by using their doodles and the action of the corresponding orbifolds. △ Less

Submitted 20 February, 2024; originally announced February 2024.

Comments: 16 pages, 21 figures, 1 table

arXiv:2402.12009 [pdf, other]

Moduli of Continuity in Metric Models and Extension of Liveability Indices

Authors: R. Arnau, J. M. Calabuig, Álvaro González, Enrique A. Sánchez Pérez

Abstract: Index spaces serve as valuable metric models for studying properties relevant to various applications, such as social science or economics. These properties are represented by real Lipschitz functions that describe the degree of association with each element within the underlying metric space. After determining the index value within a given sample subset, the classic McShane and Whitney formulas… ▽ More Index spaces serve as valuable metric models for studying properties relevant to various applications, such as social science or economics. These properties are represented by real Lipschitz functions that describe the degree of association with each element within the underlying metric space. After determining the index value within a given sample subset, the classic McShane and Whitney formulas allow a Lipschitz regression procedure to be performed to extend the index values over the entire metric space. To improve the adaptability of the metric model to specific scenarios, this paper introduces the concept of a composition metric, which involves composing a metric with an increasing, positive and subadditive function $φ$. The results presented here extend well-established results for Lipschitz indices on metric spaces to composition metrics. In addition, we establish the corresponding approximation properties that facilitate the use of this functional structure. To illustrate the power and simplicity of this mathematical framework, we provide a concrete application involving the modelling of livability indices in North American cities. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.09987 [pdf, other]

Measuring the angle $α_{ds}$ of the flattest Unitary Triangle with $\overline{B}_{d}\to φ\overline{K}^{(*)0},\overline{B}_{s}\to φ{K}^{(*)0}$ decays

Authors: Roy Aleksan, Luis Oliver, Emmanuel Perez

Abstract: We show that the angle $α_{ds}$ of the ``flattest'' unitarity triange can be directly measured using the decays $\overline{B}_{d}\to φ\overline{K}^{(*)0}$ and $\overline{B}_{s}\to φ{K}^{(*)0}$. Using both $\overline{B}_{d}$ and $\overline{B}_{s}$ enables a further consistency test since the expected time-dependent CP violating asymmetries are identical though with opposite signs. Since large stati… ▽ More We show that the angle $α_{ds}$ of the ``flattest'' unitarity triange can be directly measured using the decays $\overline{B}_{d}\to φ\overline{K}^{(*)0}$ and $\overline{B}_{s}\to φ{K}^{(*)0}$. Using both $\overline{B}_{d}$ and $\overline{B}_{s}$ enables a further consistency test since the expected time-dependent CP violating asymmetries are identical though with opposite signs. Since large statistics of $\overline{B}_{d}$ and $\overline{B}_{s}$ are needed for accurate measurements, FCC-ee and its environment at the Z-pole is well suited for such studies. These measurements, the precision of which could reach the sub-degree level, will contribute to probe further the consistency of the CP sector of the Standard Model with unprecedented level of accuracy. The main detector requirements that are set by these measurements are also outlined. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.08074 [pdf, ps, other]

On the connections between the low dimensional homology groups of $\textrm{SL}_2$ and $\textrm{PSL}_2$

Authors: Behrooz Mirzaii, Elvis Torres Pérez

Abstract: In this article we study the low dimensional homology groups of the special linear group $\textrm{SL}_2(A)$ and the projective special linear group $\textrm{PSL}_2(A)$, $A$ a domain, through the natural surjective map $\textrm{SL}_2(A) \to \textrm{PSL}_2(A)$. In particular, we study the connection of the first, the second and the third homology groups of these groups over euclidean domains… ▽ More In this article we study the low dimensional homology groups of the special linear group $\textrm{SL}_2(A)$ and the projective special linear group $\textrm{PSL}_2(A)$, $A$ a domain, through the natural surjective map $\textrm{SL}_2(A) \to \textrm{PSL}_2(A)$. In particular, we study the connection of the first, the second and the third homology groups of these groups over euclidean domains $\mathbb{Z}[\frac{1}{m}]$, $m$ a square free integer, and local domains. △ Less

Submitted 18 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

arXiv:2402.06782 [pdf, other]

Debating with More Persuasive LLMs Leads to More Truthful Answers

Authors: Akbir Khan, John Hughes, Dan Valentine, Laura Ruis, Kshitij Sachan, Ansh Radhakrishnan, Edward Grefenstette, Samuel R. Bowman, Tim Rocktäschel, Ethan Perez

Abstract: Common methods for aligning large language models (LLMs) with desired behaviour heavily rely on human-labelled data. However, as models grow increasingly sophisticated, they will surpass human expertise, and the role of human evaluation will evolve into non-experts overseeing experts. In anticipation of this, we ask: can weaker models assess the correctness of stronger models? We investigate this… ▽ More Common methods for aligning large language models (LLMs) with desired behaviour heavily rely on human-labelled data. However, as models grow increasingly sophisticated, they will surpass human expertise, and the role of human evaluation will evolve into non-experts overseeing experts. In anticipation of this, we ask: can weaker models assess the correctness of stronger models? We investigate this question in an analogous setting, where stronger models (experts) possess the necessary information to answer questions and weaker models (non-experts) lack this information. The method we evaluate is debate, where two LLM experts each argue for a different answer, and a non-expert selects the answer. We find that debate consistently helps both non-expert models and humans answer questions, achieving 76% and 88% accuracy respectively (naive baselines obtain 48% and 60%). Furthermore, optimising expert debaters for persuasiveness in an unsupervised manner improves non-expert ability to identify the truth in debates. Our results provide encouraging empirical evidence for the viability of aligning models with debate in the absence of ground truth. △ Less

Submitted 25 July, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

Comments: For code please check: https://github.com/ucl-dark/llm_debate

arXiv:2401.16883 [pdf, other]

A survey for variable young stars with small telescopes: VIII -- Properties of 1687 Gaia selected members in 21 nearby clusters

Authors: Dirk Froebrich, Aleks Scholz, Justyn Campbell-White, Siegfried Vanaverbeke, Carys Herbert, Jochen Eislöffel, Thomas Urtly, Timothy P. Long, Ivan L. Walton, Klaas Wiersema, Nick J. Quinn, Tony Rodda, Juan-Luis González-Carballo, Mario Morales Aimar, Rafael Castillo García, Francisco C. Soldán Alfaro, Faustino García de la Cuesta, Domenico Licchelli, Alex Escartin Perez, José Luis Salto González, Marc Deldem, Stephen R. L. Futcher, Tim Nelson, Shawn Dvorak, Dawid Moździerski , et al. (38 additional authors not shown)

Abstract: The Hunting Outbursting Young Stars (HOYS) project performs long-term, optical, multi-filter, high cadence monitoring of 25 nearby young clusters and star forming regions. Utilising Gaia DR3 data we have identified about 17000 potential young stellar members in 45 coherent astrometric groups in these fields. Twenty one of them are clear young groups or clusters of stars within one kiloparsec and t… ▽ More The Hunting Outbursting Young Stars (HOYS) project performs long-term, optical, multi-filter, high cadence monitoring of 25 nearby young clusters and star forming regions. Utilising Gaia DR3 data we have identified about 17000 potential young stellar members in 45 coherent astrometric groups in these fields. Twenty one of them are clear young groups or clusters of stars within one kiloparsec and they contain 9143 Gaia selected potential members. The cluster distances, proper motions and membership numbers are determined. We analyse long term (about 7yr) V, R, and I-band light curves from HOYS for 1687 of the potential cluster members. One quarter of the stars are variable in all three optical filters, and two thirds of these have light curves that are symmetric around the mean. Light curves affected by obscuration from circumstellar materials are more common than those affected by accretion bursts, by a factor of 2-4. The variability fraction in the clusters ranges from 10 to almost 100 percent, and correlates positively with the fraction of stars with detectable inner disks, indicating that a lot of variability is driven by the disk. About one in six variables shows detectable periodicity, mostly caused by magnetic spots. Two thirds of the periodic variables with disk excess emission are slow rotators, and amongst the stars without disk excess two thirds are fast rotators - in agreement with rotation being slowed down by the presence of a disk. △ Less

Submitted 30 January, 2024; originally announced January 2024.

Comments: accepted for publication in MNRAS, 1 table, 9 figures

arXiv:2401.12485 [pdf, other]

Adiabatic Quantum Support Vector Machines

Authors: Prasanna Date, Dong Jun Woun, Kathleen Hamilton, Eduardo A. Coello Perez, Mayanka Chandra Shekhar, Francisco Rios, John Gounley, In-Saeng Suh, Travis Humble, Georgia Tourassi

Abstract: Adiabatic quantum computers can solve difficult optimization problems (e.g., the quadratic unconstrained binary optimization problem), and they seem well suited to train machine learning models. In this paper, we describe an adiabatic quantum approach for training support vector machines. We show that the time complexity of our quantum approach is an order of magnitude better than the classical ap… ▽ More Adiabatic quantum computers can solve difficult optimization problems (e.g., the quadratic unconstrained binary optimization problem), and they seem well suited to train machine learning models. In this paper, we describe an adiabatic quantum approach for training support vector machines. We show that the time complexity of our quantum approach is an order of magnitude better than the classical approach. Next, we compare the test accuracy of our quantum approach against a classical approach that uses the Scikit-learn library in Python across five benchmark datasets (Iris, Wisconsin Breast Cancer (WBC), Wine, Digits, and Lambeq). We show that our quantum approach obtains accuracies on par with the classical approach. Finally, we perform a scalability study in which we compute the total training times of the quantum approach and the classical approach with increasing number of features and number of data points in the training dataset. Our scalability results show that the quantum approach obtains a 3.5--4.5 times speedup over the classical approach on datasets with many (millions of) features. △ Less

Submitted 22 January, 2024; originally announced January 2024.

arXiv:2401.06330 [pdf, ps, other]

The abelianization of the elementary group of rank two

Authors: Behrooz Mirzaii, Elvis Torres Pérez

Abstract: For an arbitrary ring $A$, we study the abelianization of the elementary group $\textrm{E}_2(A)$. In particular, we show that for a commutative ring $A$ there exists an exact sequence \[ K_2(2,A)/C(2,A) \to A/M \to \textrm{E}_2(A)^\textrm{ab} \to 1, \] where $C(2,A)$ is the central subgroup of the Steinberg group $\textrm{St}(2,A)$ generated by the Steinberg symbols and $M$ is the additive subgrou… ▽ More For an arbitrary ring $A$, we study the abelianization of the elementary group $\textrm{E}_2(A)$. In particular, we show that for a commutative ring $A$ there exists an exact sequence \[ K_2(2,A)/C(2,A) \to A/M \to \textrm{E}_2(A)^\textrm{ab} \to 1, \] where $C(2,A)$ is the central subgroup of the Steinberg group $\textrm{St}(2,A)$ generated by the Steinberg symbols and $M$ is the additive subgroup of $A$ generated by $x(a^2-1)$ and $3(b+1)(c+1)$, with $x\in A$, $a,b,c \in A^{\times}$. △ Less

Submitted 12 October, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

arXiv:2401.05566 [pdf, other]

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Authors: Evan Hubinger, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Daniel M. Ziegler, Tim Maxwell, Newton Cheng, Adam Jermyn, Amanda Askell, Ansh Radhakrishnan, Cem Anil, David Duvenaud, Deep Ganguli, Fazl Barez, Jack Clark, Kamal Ndousse, Kshitij Sachan, Michael Sellitto, Mrinank Sharma, Nova DasSarma, Roger Grosse, Shauna Kravec , et al. (14 additional authors not shown)

Abstract: Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. If an AI system learned such a deceptive strategy, could we detect it and remove it using current state-of-the-art safety training techniques? To study this question, we construct proof-of-concept exa… ▽ More Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. If an AI system learned such a deceptive strategy, could we detect it and remove it using current state-of-the-art safety training techniques? To study this question, we construct proof-of-concept examples of deceptive behavior in large language models (LLMs). For example, we train models that write secure code when the prompt states that the year is 2023, but insert exploitable code when the stated year is 2024. We find that such backdoor behavior can be made persistent, so that it is not removed by standard safety training techniques, including supervised fine-tuning, reinforcement learning, and adversarial training (eliciting unsafe behavior and then training to remove it). The backdoor behavior is most persistent in the largest models and in models trained to produce chain-of-thought reasoning about deceiving the training process, with the persistence remaining even when the chain-of-thought is distilled away. Furthermore, rather than removing backdoors, we find that adversarial training can teach models to better recognize their backdoor triggers, effectively hiding the unsafe behavior. Our results suggest that, once a model exhibits deceptive behavior, standard techniques could fail to remove such deception and create a false impression of safety. △ Less

Submitted 17 January, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

Comments: updated to add missing acknowledgements

arXiv:2401.04341 [pdf, ps, other]

The Third Homology of Projective Special Linear Group of Rank Two

Authors: Behrooz Mirzaii, Elvis Torres Pérez

Abstract: In this paper we investigate the third homology of the projective special linear group $\textrm{PSL}_2(A)$. As a result of our investigation we prove a projective refined Bloch-Wigner exact sequence over certain class of rings. The projective Bloch-Wigner exact sequence over algebraically closed fields is a classical result. In this paper we investigate the third homology of the projective special linear group $\textrm{PSL}_2(A)$. As a result of our investigation we prove a projective refined Bloch-Wigner exact sequence over certain class of rings. The projective Bloch-Wigner exact sequence over algebraically closed fields is a classical result. △ Less

Submitted 8 January, 2024; originally announced January 2024.

arXiv:2312.14288 [pdf]

The Status and Prospects of Phytoremediation of Heavy Metals

Authors: Aniruddha Acharya, Enrique Perez, Miller Maddox-Mandolini, Hania De La Fuente

Abstract: The release of heavy metals into the agricultural soil and waterbodies has been accelerated due to anthropogenic activities. They are not usually required for biological functions thus, their accumulation in biological system poses serious threat to health and environment globally. Phytoremediation offers a safe, inexpensive, and ecologically sustainable technique to clean habitats contaminated wi… ▽ More The release of heavy metals into the agricultural soil and waterbodies has been accelerated due to anthropogenic activities. They are not usually required for biological functions thus, their accumulation in biological system poses serious threat to health and environment globally. Phytoremediation offers a safe, inexpensive, and ecologically sustainable technique to clean habitats contaminated with heavy metals. Though several plants have been identified and used as a potential candidate for such phytoremediation, the technique is still at its formative stage and has been mostly confined to laboratory and greenhouses. However, recently several field studies have shown promising results that can propel large-scale implementation of this technology in industrial sites and urban agriculture. Realistically, the commercialization of this technique is possible if interdisciplinary approach is employed to increase its efficiency. This review presents a comprehensive narration of the status and future of the technique. It illustrates the concept of phytoremediation, the ecological and commercial benefits, and the types of phytoremediation. The candidate plants and factors that influences phytoremediation has been discussed. Finally, the physiological and molecular mechanism along with the future of the technique has been described. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: 34 pages, 3 figures, 2 tables, review paper

arXiv:2312.13342 [pdf, other]

SENSEI: First Direct-Detection Results on sub-GeV Dark Matter from SENSEI at SNOLAB

Authors: SENSEI Collaboration, Prakruth Adari, Itay M. Bloch, Ana M. Botti, Mariano Cababie, Gustavo Cancelo, Brenda A. Cervantes-Vergara, Michael Crisler, Miguel Daal, Ansh Desai, Alex Drlica-Wagner, Rouven Essig, Juan Estrada, Erez Etzion, Guillermo Fernandez Moroni, Stephen E. Holland, Yonatan Kehat, Yaron Korn, Ian Lawson, Steffon Luoma, Aviv Orly, Santiago E. Perez, Dario Rodrigues, Nathan A. Saffold, Silvia Scorza , et al. (12 additional authors not shown)

Abstract: We present the first results from a dark matter search using six Skipper-CCDs in the SENSEI detector operating at SNOLAB. With an exposure of 534.9 gram-days from well-performing sensors, we select events containing 2 to 10 electron-hole pairs. After aggressively masking images to remove backgrounds, we observe 55 two-electron events, 4 three-electron events, and no events containing 4 to 10 elect… ▽ More We present the first results from a dark matter search using six Skipper-CCDs in the SENSEI detector operating at SNOLAB. With an exposure of 534.9 gram-days from well-performing sensors, we select events containing 2 to 10 electron-hole pairs. After aggressively masking images to remove backgrounds, we observe 55 two-electron events, 4 three-electron events, and no events containing 4 to 10 electrons. The two-electron events are consistent with pileup from one-electron events. Among the 4 three-electron events, 2 appear in pixels that are likely impacted by detector defects, although not strongly enough to trigger our "hot-pixel" mask. We use these data to set world-leading constraints on sub-GeV dark matter interacting with electrons and nuclei. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: 5 pages, 2 figures, + Supplemental Materials (5 pages, 5 figures) + References

Report number: YITP-SB-2023-30, FERMILAB-PUB-23-0824-CSAID-PPD

arXiv:2311.08576 [pdf, other]

Towards Evaluating AI Systems for Moral Status Using Self-Reports

Authors: Ethan Perez, Robert Long

Abstract: As AI systems become more advanced and widely deployed, there will likely be increasing debate over whether AI systems could have conscious experiences, desires, or other states of potential moral significance. It is important to inform these discussions with empirical evidence to the extent possible. We argue that under the right circumstances, self-reports, or an AI system's statements about its… ▽ More As AI systems become more advanced and widely deployed, there will likely be increasing debate over whether AI systems could have conscious experiences, desires, or other states of potential moral significance. It is important to inform these discussions with empirical evidence to the extent possible. We argue that under the right circumstances, self-reports, or an AI system's statements about its own internal states, could provide an avenue for investigating whether AI systems have states of moral significance. Self-reports are the main way such states are assessed in humans ("Are you in pain?"), but self-reports from current systems like large language models are spurious for many reasons (e.g. often just reflecting what humans would say). To make self-reports more appropriate for this purpose, we propose to train models to answer many kinds of questions about themselves with known answers, while avoiding or limiting training incentives that bias self-reports. The hope of this approach is that models will develop introspection-like capabilities, and that these capabilities will generalize to questions about states of moral significance. We then propose methods for assessing the extent to which these techniques have succeeded: evaluating self-report consistency across contexts and between similar models, measuring the confidence and resilience of models' self-reports, and using interpretability to corroborate self-reports. We also discuss challenges for our approach, from philosophical difficulties in interpreting self-reports to technical reasons why our proposal might fail. We hope our discussion inspires philosophers and AI researchers to criticize and improve our proposed methodology, as well as to run experiments to test whether self-reports can be made reliable enough to provide information about states of moral significance. △ Less

Submitted 14 November, 2023; originally announced November 2023.

arXiv:2311.05949 [pdf, other]

Lax-type pairs in the theory of bivariate orthogonal polynomials

Authors: Amílcar Branquinho, Ana Foulquié-Moreno, Teresa E. Pérez, Miguel A. Piñar

Abstract: Sequences of bivariate orthogonal polynomials written as vector polynomials of increasing size satisfy a couple of three term relations with matrix coefficients. In this work, introducing a time-dependent parameter, we analyse a Lax-type pair system for the coefficients of the three term relations. We also deduce several characterizations relating the Lax-type pair, the shape of the weight, Stielt… ▽ More Sequences of bivariate orthogonal polynomials written as vector polynomials of increasing size satisfy a couple of three term relations with matrix coefficients. In this work, introducing a time-dependent parameter, we analyse a Lax-type pair system for the coefficients of the three term relations. We also deduce several characterizations relating the Lax-type pair, the shape of the weight, Stieltjes function, moments, a differential equation for the weight, and the bidimensional Toda-type systems. △ Less

Submitted 10 November, 2023; originally announced November 2023.

MSC Class: 42C05; 33C50; 35Q53

arXiv:2310.13798 [pdf, other]

Specific versus General Principles for Constitutional AI

Authors: Sandipan Kundu, Yuntao Bai, Saurav Kadavath, Amanda Askell, Andrew Callahan, Anna Chen, Anna Goldie, Avital Balwit, Azalia Mirhoseini, Brayden McLean, Catherine Olsson, Cassie Evraets, Eli Tran-Johnson, Esin Durmus, Ethan Perez, Jackson Kernion, Jamie Kerr, Kamal Ndousse, Karina Nguyen, Nelson Elhage, Newton Cheng, Nicholas Schiefer, Nova DasSarma, Oliver Rausch, Robin Larson , et al. (11 additional authors not shown)

Abstract: Human feedback can prevent overtly harmful utterances in conversational models, but may not automatically mitigate subtle problematic behaviors such as a stated desire for self-preservation or power. Constitutional AI offers an alternative, replacing human feedback with feedback from AI models conditioned only on a list of written principles. We find this approach effectively prevents the expressi… ▽ More Human feedback can prevent overtly harmful utterances in conversational models, but may not automatically mitigate subtle problematic behaviors such as a stated desire for self-preservation or power. Constitutional AI offers an alternative, replacing human feedback with feedback from AI models conditioned only on a list of written principles. We find this approach effectively prevents the expression of such behaviors. The success of simple principles motivates us to ask: can models learn general ethical behaviors from only a single written principle? To test this, we run experiments using a principle roughly stated as "do what's best for humanity". We find that the largest dialogue models can generalize from this short constitution, resulting in harmless assistants with no stated interest in specific motivations like power. A general principle may thus partially avoid the need for a long list of constitutions targeting potentially harmful behaviors. However, more detailed constitutions still improve fine-grained control over specific types of harms. This suggests both general and specific principles have value for steering AI safely. △ Less

Submitted 20 October, 2023; originally announced October 2023.

arXiv:2310.13548 [pdf, other]

Towards Understanding Sycophancy in Language Models

Authors: Mrinank Sharma, Meg Tong, Tomasz Korbak, David Duvenaud, Amanda Askell, Samuel R. Bowman, Newton Cheng, Esin Durmus, Zac Hatfield-Dodds, Scott R. Johnston, Shauna Kravec, Timothy Maxwell, Sam McCandlish, Kamal Ndousse, Oliver Rausch, Nicholas Schiefer, Da Yan, Miranda Zhang, Ethan Perez

Abstract: Human feedback is commonly utilized to finetune AI assistants. But human feedback may also encourage model responses that match user beliefs over truthful ones, a behaviour known as sycophancy. We investigate the prevalence of sycophancy in models whose finetuning procedure made use of human feedback, and the potential role of human preference judgments in such behavior. We first demonstrate that… ▽ More Human feedback is commonly utilized to finetune AI assistants. But human feedback may also encourage model responses that match user beliefs over truthful ones, a behaviour known as sycophancy. We investigate the prevalence of sycophancy in models whose finetuning procedure made use of human feedback, and the potential role of human preference judgments in such behavior. We first demonstrate that five state-of-the-art AI assistants consistently exhibit sycophancy across four varied free-form text-generation tasks. To understand if human preferences drive this broadly observed behavior, we analyze existing human preference data. We find that when a response matches a user's views, it is more likely to be preferred. Moreover, both humans and preference models (PMs) prefer convincingly-written sycophantic responses over correct ones a non-negligible fraction of the time. Optimizing model outputs against PMs also sometimes sacrifices truthfulness in favor of sycophancy. Overall, our results indicate that sycophancy is a general behavior of state-of-the-art AI assistants, likely driven in part by human preference judgments favoring sycophantic responses. △ Less

Submitted 27 October, 2023; v1 submitted 20 October, 2023; originally announced October 2023.

Comments: 32 pages, 20 figures

ACM Class: I.2.6

arXiv:2310.12921 [pdf, other]

Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning

Authors: Juan Rocamonde, Victoriano Montesinos, Elvis Nava, Ethan Perez, David Lindner

Abstract: Reinforcement learning (RL) requires either manually specifying a reward function, which is often infeasible, or learning a reward model from a large amount of human feedback, which is often very expensive. We study a more sample-efficient alternative: using pretrained vision-language models (VLMs) as zero-shot reward models (RMs) to specify tasks via natural language. We propose a natural and gen… ▽ More Reinforcement learning (RL) requires either manually specifying a reward function, which is often infeasible, or learning a reward model from a large amount of human feedback, which is often very expensive. We study a more sample-efficient alternative: using pretrained vision-language models (VLMs) as zero-shot reward models (RMs) to specify tasks via natural language. We propose a natural and general approach to using VLMs as reward models, which we call VLM-RMs. We use VLM-RMs based on CLIP to train a MuJoCo humanoid to learn complex tasks without a manually specified reward function, such as kneeling, doing the splits, and sitting in a lotus position. For each of these tasks, we only provide a single sentence text prompt describing the desired task with minimal prompt engineering. We provide videos of the trained agents at: https://sites.google.com/view/vlm-rm. We can improve performance by providing a second "baseline" prompt and projecting out parts of the CLIP embedding space irrelevant to distinguish between goal and baseline. Further, we find a strong scaling effect for VLM-RMs: larger VLMs trained with more compute and data are better reward models. The failure modes of VLM-RMs we encountered are all related to known capability limitations of current VLMs, such as limited spatial reasoning ability or visually unrealistic environments that are far off-distribution for the VLM. We find that VLM-RMs are remarkably robust as long as the VLM is large enough. This suggests that future VLMs will become more and more useful reward models for a wide range of RL applications. △ Less

Submitted 14 March, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

Comments: Presented at International Conference on Learning Representations (ICLR) 2024

arXiv:2310.08239 [pdf, ps, other]

Centrosymmetric and reverse matrices in bivariate orthogonal polynomials

Authors: Cleonice F. Bracciali, Glalco S. Costa, Teresa E. Pérez

Abstract: We introduce the concept of reflexive moment functional in two variables and the definition of reflexive orthogonal polynomial system. Also reverse matrices and their interesting algebraic properties are studied. Reverse matrices and reflexive polynomial systems are directly connected in the context of bivariate orthogonal polynomials. Centrosymmetric matrices, reverse matrices and their connectio… ▽ More We introduce the concept of reflexive moment functional in two variables and the definition of reflexive orthogonal polynomial system. Also reverse matrices and their interesting algebraic properties are studied. Reverse matrices and reflexive polynomial systems are directly connected in the context of bivariate orthogonal polynomials. Centrosymmetric matrices, reverse matrices and their connections with reflexive orthogonal polynomial systems are presented. Finally, several particular cases and examples are analysed. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: 24 pages

MSC Class: 42C05; 33C50; 15A09; 15B99

arXiv:2310.07173 [pdf]

Unleashing quantum algorithms with Qinterpreter: bridging the gap between theory and practice across leading quantum computing platforms

Authors: Wilmer Contreras Sepúlveda, Ángel David Torres-Palencia, José Javier Sánchez Mondragón, Braulio Misael Villegas-Martínez, J. Jesús Escobedo-Alatorre, Sandra Gesing, Néstor Lozano-Crisóstomo, Julio César García-Melgarejo, Juan Carlos Sánchez Pérez, Eddie Nelson Palacios- Pérez, Omar PalilleroSandoval

Abstract: Quantum computing is a rapidly emerging and promising field that has the potential to revolutionize numerous research domains, including drug design, network technologies and sustainable energy. Due to the inherent complexity and divergence from classical computing, several major quantum computing libraries have been developed to implement quantum algorithms, namely IBM Qiskit, Amazon Braket, Cirq… ▽ More Quantum computing is a rapidly emerging and promising field that has the potential to revolutionize numerous research domains, including drug design, network technologies and sustainable energy. Due to the inherent complexity and divergence from classical computing, several major quantum computing libraries have been developed to implement quantum algorithms, namely IBM Qiskit, Amazon Braket, Cirq, PyQuil, and PennyLane. These libraries allow for quantum simulations on classical computers and facilitate program execution on corresponding quantum hardware, e.g., Qiskit programs on IBM quantum computers. While all platforms have some differences, the main concepts are the same. QInterpreter is a tool embedded in the Quantum Science Gateway QubitHub using Jupyter Notebooks that translates seamlessly programs from one library to the other and visualizes the results. It combines the five well-known quantum libraries: into a unified framework. Designed as an educational tool for beginners, Qinterpreter enables the development and execution of quantum circuits across various platforms in a straightforward way. The work highlights the versatility and accessibility of Qinterpreter in quantum programming and underscores our ultimate goal of pervading Quantum Computing through younger, less specialized, and diverse cultural and national communities. △ Less

Submitted 16 October, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

Comments: Final article submitted to Peer J computer science Journal

arXiv:2309.15937 [pdf, ps, other]

The pluriclosed flow for $T^2$-invariant Vaisman metrics on the Kodaira-Thurston surface

Authors: Anna Fino, Gueo Grantcharov, Eddy Perez

Abstract: In this note we study $T^2$-invariant pluriclosed metrics on the Kodaira-Thurston surface. We obtain a characterization of $T^2$-invariant Vaisman metrics, and notice that the Kodaira-Thurston surface admits Vaisman metrics with non-constant scalar curvature. Then we study the behaviour of the Vaisman condition in relation to the pluriclosed flow. As a consequence, we show that if the initial metr… ▽ More In this note we study $T^2$-invariant pluriclosed metrics on the Kodaira-Thurston surface. We obtain a characterization of $T^2$-invariant Vaisman metrics, and notice that the Kodaira-Thurston surface admits Vaisman metrics with non-constant scalar curvature. Then we study the behaviour of the Vaisman condition in relation to the pluriclosed flow. As a consequence, we show that if the initial metric on the Kodaira-Thurston surface is a $T^2$-invariant Vaisman metric, then the pluriclosed flow preserves the Vaisman condition, extending to the non-constant scalar curvature case the previous results in [6]. △ Less

Submitted 27 September, 2023; originally announced September 2023.

arXiv:2308.04803 [pdf, ps, other]

Extreme Value Theory-based Robust Minimum-Power Precoding for URLLC

Authors: Dian Echevarría Pérez, Onel L. Alcaraz López, Hirley Alves

Abstract: Channel state information (CSI) is crucial for achieving ultra-reliable low-latency communication (URLLC) in wireless networks. The main associated problems are the CSI acquisition time, which impacts the delay requirements of time-critical applications, and the estimation accuracy, which degrades the signal-to-interference-plus-noise ratio (SINR), thus, reducing reliability. In this work, we form… ▽ More Channel state information (CSI) is crucial for achieving ultra-reliable low-latency communication (URLLC) in wireless networks. The main associated problems are the CSI acquisition time, which impacts the delay requirements of time-critical applications, and the estimation accuracy, which degrades the signal-to-interference-plus-noise ratio (SINR), thus, reducing reliability. In this work, we formulate and solve a minimum-power precoding design problem simultaneously serving multiple URLLC users in the downlink with imperfect CSI availability. Specifically, we develop an algorithm that exploits state-of-the-art precoding schemes such as the maximal ratio transmission (MRT) and zero-forcing (ZF), and adjust the power of the precoders to compensate for the channel estimation error uncertainty based on the extreme value theory (EVT) framework. Finally, we evaluate the performance of our method and show its superiority concerning worst-case robust precoding, which is used as a benchmark. △ Less

Submitted 9 August, 2023; originally announced August 2023.

Comments: 11 pages, 9 figures, submitted to TWC

arXiv:2308.03296 [pdf, other]

Studying Large Language Model Generalization with Influence Functions

Authors: Roger Grosse, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin, Amirhossein Tajdini, Benoit Steiner, Dustin Li, Esin Durmus, Ethan Perez, Evan Hubinger, Kamilė Lukošiūtė, Karina Nguyen, Nicholas Joseph, Sam McCandlish, Jared Kaplan, Samuel R. Bowman

Abstract: When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior? Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a given sequence were added to the training set?… ▽ More When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior? Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a given sequence were added to the training set? While influence functions have produced insights for small models, they are difficult to scale to large language models (LLMs) due to the difficulty of computing an inverse-Hessian-vector product (IHVP). We use the Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC) approximation to scale influence functions up to LLMs with up to 52 billion parameters. In our experiments, EK-FAC achieves similar accuracy to traditional influence function estimators despite the IHVP computation being orders of magnitude faster. We investigate two algorithmic techniques to reduce the cost of computing gradients of candidate training sequences: TF-IDF filtering and query batching. We use influence functions to investigate the generalization patterns of LLMs, including the sparsity of the influence patterns, increasing abstraction with scale, math and programming abilities, cross-lingual generalization, and role-playing behavior. Despite many apparently sophisticated forms of generalization, we identify a surprising limitation: influences decay to near-zero when the order of key phrases is flipped. Overall, influence functions give us a powerful new tool for studying the generalization properties of LLMs. △ Less

Submitted 7 August, 2023; originally announced August 2023.

Comments: 119 pages, 47 figures, 22 tables

arXiv:2307.13702 [pdf, other]

Measuring Faithfulness in Chain-of-Thought Reasoning

Authors: Tamera Lanham, Anna Chen, Ansh Radhakrishnan, Benoit Steiner, Carson Denison, Danny Hernandez, Dustin Li, Esin Durmus, Evan Hubinger, Jackson Kernion, Kamilė Lukošiūtė, Karina Nguyen, Newton Cheng, Nicholas Joseph, Nicholas Schiefer, Oliver Rausch, Robin Larson, Sam McCandlish, Sandipan Kundu, Saurav Kadavath, Shannon Yang, Thomas Henighan, Timothy Maxwell, Timothy Telleen-Lawton, Tristan Hume , et al. (5 additional authors not shown)

Abstract: Large language models (LLMs) perform better when they produce step-by-step, "Chain-of-Thought" (CoT) reasoning before answering a question, but it is unclear if the stated reasoning is a faithful explanation of the model's actual reasoning (i.e., its process for answering the question). We investigate hypotheses for how CoT reasoning may be unfaithful, by examining how the model predictions change… ▽ More Large language models (LLMs) perform better when they produce step-by-step, "Chain-of-Thought" (CoT) reasoning before answering a question, but it is unclear if the stated reasoning is a faithful explanation of the model's actual reasoning (i.e., its process for answering the question). We investigate hypotheses for how CoT reasoning may be unfaithful, by examining how the model predictions change when we intervene on the CoT (e.g., by adding mistakes or paraphrasing it). Models show large variation across tasks in how strongly they condition on the CoT when predicting their answer, sometimes relying heavily on the CoT and other times primarily ignoring it. CoT's performance boost does not seem to come from CoT's added test-time compute alone or from information encoded via the particular phrasing of the CoT. As models become larger and more capable, they produce less faithful reasoning on most tasks we study. Overall, our results suggest that CoT can be faithful if the circumstances such as the model size and task are carefully chosen. △ Less

Submitted 16 July, 2023; originally announced July 2023.

Showing 1–50 of 431 results for author: Perez, E