-
Flux-pump induced degradation of $T_1$ for dissipative cat qubits
Authors:
Léon Carde,
Pierre Rouchon,
Joachim Cohen,
Alexandru Petrescu
Abstract:
Dissipative stabilization of cat qubits autonomously corrects for bit flip errors by ensuring that reservoir-engineered two-photon losses dominate over other mechanisms inducing phase flip errors. To describe the latter, we derive an effective master equation for an asymmetrically threaded SQUID based superconducting circuit used to stabilize a dissipative cat qubit. We analyze the dressing of rel…
▽ More
Dissipative stabilization of cat qubits autonomously corrects for bit flip errors by ensuring that reservoir-engineered two-photon losses dominate over other mechanisms inducing phase flip errors. To describe the latter, we derive an effective master equation for an asymmetrically threaded SQUID based superconducting circuit used to stabilize a dissipative cat qubit. We analyze the dressing of relaxation processes under drives in time-dependent Schrieffer-Wolff perturbation theory for weakly anharmonic bosonic degrees of freedom, and in numerically exact Floquet theory. We find that spurious single-photon decay rates can increase under the action of the parametric pump that generates the required interactions for cat-qubit stabilization. Our analysis feeds into mitigation strategies that can inform current experiments, and the methods presented here can be extended to other circuit implementations.
△ Less
Submitted 1 October, 2024;
originally announced October 2024.
-
Inferring the parameters of Taylor's law in ecology
Authors:
Lionel Truquet,
Joel E. Cohen,
Paul Doukhan
Abstract:
Taylor's power law (TL) or fluctuation scaling has been verified empirically for the abundances of many species, human and non-human, and in many other fields including physics, meteorology, computer science, and finance. TL asserts that the variance is directly proportional to a power of the mean, exactly for population moments and, whether or not population moments exist, approximately for sampl…
▽ More
Taylor's power law (TL) or fluctuation scaling has been verified empirically for the abundances of many species, human and non-human, and in many other fields including physics, meteorology, computer science, and finance. TL asserts that the variance is directly proportional to a power of the mean, exactly for population moments and, whether or not population moments exist, approximately for sample moments. In many papers, linear regression of log variance as a function of log mean is used to estimate TL's parameters. We provide some statistical guarantees with large-sample asymptotics for this kind of inference under general conditions, and we derive confidence intervals for the parameters. In many ecological applications, the means and variances are estimated over time or across space from arrays of abundance data collected at different locations and time points. When the ratio between the time-series length and the number of spatial points converges to a constant as both become large, the usual normalized statistics are asymptotically biased. We provide a bias correction to get correct confidence intervals. TL, widely studied in multiple sciences, is a source of challenging new statistical problems in a nonstationary spatiotemporal framework. We illustrate our results with both simulated and real data sets.
△ Less
Submitted 27 August, 2024;
originally announced August 2024.
-
Investigating Complex HPV Dynamics Using Emulation and History Matching
Authors:
Andrew Iskauskas,
Jamie A. Cohen,
Danny Scarponi,
Ian Vernon,
Michael Goldstein,
Daniel Klein,
Richard G. White,
Nicky McCreesh
Abstract:
The study of transmission and progression of human papillomavirus (HPV) is crucial for understanding the incidence of cervical cancers, and has been identified as a priority worldwide. The complexity of the disease necessitates a detailed model of HPV transmission and its progression to cancer; to infer properties of the above we require a careful process that can match to imperfect or incomplete…
▽ More
The study of transmission and progression of human papillomavirus (HPV) is crucial for understanding the incidence of cervical cancers, and has been identified as a priority worldwide. The complexity of the disease necessitates a detailed model of HPV transmission and its progression to cancer; to infer properties of the above we require a careful process that can match to imperfect or incomplete observational data. In this paper, we describe the HPVsim simulator to satisfy the former requirement; to satisfy the latter we couple this stochastic simulator to a process of emulation and history matching using the R package hmer. With these tools, we are able to obtain a comprehensive collection of parameter combinations that could give rise to observed cancer data, and explore the implications of the variability of these parameter sets as it relates to future health interventions.
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
Within-host infection dynamics with master equations and the method of moments: A case study of human papillomavirus in the epithelium
Authors:
Mariah C. Boudreau,
Jamie A. Cohen,
Laurent Hébert-Dufresne
Abstract:
Master equations provide researchers with the ability to track the distribution over possible states of a system. From these equations, we can summarize the temporal dynamics through a method of moments. These distributions and their moments capture the stochastic nature of a system, which is essential to study infectious diseases. In this paper, we define the states of the system to be the number…
▽ More
Master equations provide researchers with the ability to track the distribution over possible states of a system. From these equations, we can summarize the temporal dynamics through a method of moments. These distributions and their moments capture the stochastic nature of a system, which is essential to study infectious diseases. In this paper, we define the states of the system to be the number of infected cells of a given type in the epithelium, the hollow organ tissue in the human body. Epithelium found in the cervix provides a location for viral infections to live and persist, such as human papillomavirus (HPV). HPV is a highly transmissible disease which most commonly affects biological females and has the potential to progress into cervical cancer. By defining a master equation model which tracks the infected cell layer dynamics, information on disease extinction, progression, and viral output can be derived from the method of moments. From this methodology and the outcomes we glean from it, we aim to inform differing states of HPV infected cells, and assess the effects of structural information for each outcome.
△ Less
Submitted 9 August, 2024;
originally announced August 2024.
-
Canadian Traveller Problems in Temporal Graphs
Authors:
Thomas Bellitto,
Johanne Cohen,
Bruno Escoffier,
Minh-Hang Nguyen,
Mikael Rabie
Abstract:
This paper formalises the Canadian Traveller problem as a positional two-player game on graphs. We consider two variants depending on whether an edge is blocked. In the locally-informed variant, the traveller learns if an edge is blocked upon reaching one of its endpoints, while in the uninformed variant, they discover this only when the edge is supposed to appear. We provide a polynomial algorith…
▽ More
This paper formalises the Canadian Traveller problem as a positional two-player game on graphs. We consider two variants depending on whether an edge is blocked. In the locally-informed variant, the traveller learns if an edge is blocked upon reaching one of its endpoints, while in the uninformed variant, they discover this only when the edge is supposed to appear. We provide a polynomial algorithm for each shortest path variant in the uninformed case. This algorithm also solves the case of directed acyclic non-temporal graphs.
In the locally-informed case, we prove that finding a winning strategy is PSPACE-complete. Moreover, we establish that the problem is polynomial-time solvable when $k=1$ but NP-hard for $k\geq 2$.
Additionally, we show that the standard (non-temporal) Canadian Traveller Problem is NP-hard when there are $k\geq 4$ blocked edges, which is, to the best of our knowledge, the first hardness result for CTP for a constant number of blocked edges.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
Siegel $\mathfrak{p}^2$ Vectors for Representations of $GSp(4)$
Authors:
Jonathan Cohen
Abstract:
Let $F$ be a $p$-adic field and $(π, V)$ an irreducible complex representation of $G=GSp(4, F)$ with trivial central character. Let ${\rm Si}(\mathfrak{p}^2)\subset G$ denote the Siegel congruence subgroup of level $\mathfrak{p}^2$ and $u\in N_G({\rm Si}(\mathfrak{p}^2))$ the Atkin-Lehner element. We compute the dimension of the space of ${\rm Si}(\mathfrak{p}^2)$-fixed vectors in $V$ as well as t…
▽ More
Let $F$ be a $p$-adic field and $(π, V)$ an irreducible complex representation of $G=GSp(4, F)$ with trivial central character. Let ${\rm Si}(\mathfrak{p}^2)\subset G$ denote the Siegel congruence subgroup of level $\mathfrak{p}^2$ and $u\in N_G({\rm Si}(\mathfrak{p}^2))$ the Atkin-Lehner element. We compute the dimension of the space of ${\rm Si}(\mathfrak{p}^2)$-fixed vectors in $V$ as well as the signatures of the involutions $π(u)$ acting on these spaces.
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
Bessel Models for Representations of $GSp(4, q)$
Authors:
Jonathan Cohen
Abstract:
We compute the Bessel models of irreducible representations of the finite group $GSp(4, q)$.
We compute the Bessel models of irreducible representations of the finite group $GSp(4, q)$.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Nemotron-4 340B Technical Report
Authors:
Nvidia,
:,
Bo Adler,
Niket Agarwal,
Ashwath Aithal,
Dong H. Anh,
Pallab Bhattacharya,
Annika Brundyn,
Jared Casper,
Bryan Catanzaro,
Sharon Clay,
Jonathan Cohen,
Sirshak Das,
Ayush Dattagupta,
Olivier Delalleau,
Leon Derczynski,
Yi Dong,
Daniel Egert,
Ellie Evans,
Aleksander Ficek,
Denys Fridman,
Shaona Ghosh,
Boris Ginsburg,
Igor Gitman,
Tomasz Grzegorzek
, et al. (58 additional authors not shown)
Abstract:
We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation be…
▽ More
We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation benchmarks, and were sized to fit on a single DGX H100 with 8 GPUs when deployed in FP8 precision. We believe that the community can benefit from these models in various research studies and commercial applications, especially for generating synthetic data to train smaller language models. Notably, over 98% of data used in our model alignment process is synthetically generated, showcasing the effectiveness of these models in generating synthetic data. To further support open research and facilitate model development, we are also open-sourcing the synthetic data generation pipeline used in our model alignment process.
△ Less
Submitted 6 August, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
Merlin: A Vision Language Foundation Model for 3D Computed Tomography
Authors:
Louis Blankemeier,
Joseph Paul Cohen,
Ashwin Kumar,
Dave Van Veen,
Syed Jamal Safdar Gardezi,
Magdalini Paschali,
Zhihong Chen,
Jean-Benoit Delbrouck,
Eduardo Reis,
Cesar Truyts,
Christian Bluethgen,
Malte Engmann Kjeldskov Jensen,
Sophie Ostmeier,
Maya Varma,
Jeya Maria Jose Valanarasu,
Zhongnan Fang,
Zepeng Huo,
Zaid Nabulsi,
Diego Ardila,
Wei-Hung Weng,
Edson Amaro Junior,
Neera Ahuja,
Jason Fries,
Nigam H. Shah,
Andrew Johnston
, et al. (6 additional authors not shown)
Abstract:
Over 85 million computed tomography (CT) scans are performed annually in the US, of which approximately one quarter focus on the abdomen. Given the current radiologist shortage, there is a large impetus to use artificial intelligence to alleviate the burden of interpreting these complex imaging studies. Prior state-of-the-art approaches for automated medical image interpretation leverage vision la…
▽ More
Over 85 million computed tomography (CT) scans are performed annually in the US, of which approximately one quarter focus on the abdomen. Given the current radiologist shortage, there is a large impetus to use artificial intelligence to alleviate the burden of interpreting these complex imaging studies. Prior state-of-the-art approaches for automated medical image interpretation leverage vision language models (VLMs). However, current medical VLMs are generally limited to 2D images and short reports, and do not leverage electronic health record (EHR) data for supervision. We introduce Merlin - a 3D VLM that we train using paired CT scans (6+ million images from 15,331 CTs), EHR diagnosis codes (1.8+ million codes), and radiology reports (6+ million tokens). We evaluate Merlin on 6 task types and 752 individual tasks. The non-adapted (off-the-shelf) tasks include zero-shot findings classification (31 findings), phenotype classification (692 phenotypes), and zero-shot cross-modal retrieval (image to findings and image to impressions), while model adapted tasks include 5-year disease prediction (6 diseases), radiology report generation, and 3D semantic segmentation (20 organs). We perform internal validation on a test set of 5,137 CTs, and external validation on 7,000 clinical CTs and on two public CT datasets (VerSe, TotalSegmentator). Beyond these clinically-relevant evaluations, we assess the efficacy of various network architectures and training strategies to depict that Merlin has favorable performance to existing task-specific baselines. We derive data scaling laws to empirically assess training data needs for requisite downstream task performance. Furthermore, unlike conventional VLMs that require hundreds of GPUs for training, we perform all training on a single GPU.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Gaps Between Consecutive Primes and the Exponential Distribution
Authors:
Joel E. Cohen
Abstract:
Based on the primes less than $4 \times 10^{18}$, Oliveira e Silva et al. (2014) conjectured an asymptotic formula for the sum of the $k$th power of the gaps between consecutive primes less than a large number $x$. We show that the conjecture of Oliveira e Silva holds if and only if the $k$th moment of the first $n$ gaps is asymptotic to the $k$th moment of an exponential distribution with mean…
▽ More
Based on the primes less than $4 \times 10^{18}$, Oliveira e Silva et al. (2014) conjectured an asymptotic formula for the sum of the $k$th power of the gaps between consecutive primes less than a large number $x$. We show that the conjecture of Oliveira e Silva holds if and only if the $k$th moment of the first $n$ gaps is asymptotic to the $k$th moment of an exponential distribution with mean $\log n$, though the distribution of gaps is not exponential. Asymptotically exponential moments imply that the gaps asymptotically obey Taylor's law of fluctuation scaling: variance of the first $n$ gaps $\sim$ (mean of the first $n$ gaps)$^2$. If the distribution of the first $n$ gaps is asymptotically exponential with mean $\log n$, then the expectation of the largest of the first $n$ gaps is asymptotic to $(\log n)^2$. The largest of the first $n$ gaps is asymptotic to $(\log n)^2$ if and only if the Cramér-Shanks conjecture holds. Numerical counts of gaps and the maximal gap $G_n$ among the first $n$ gaps test these results. While most values of $G_n$ are better approximated by $(\log n)^2$ than by other models, seven values of $n$ with $G_{n} >2 e^{-γ}(\log n)^2$ suggest that $\limsup_{n \to\infty} G_n/[2e^{-γ}(\log n)^2]$ may exceed 1.
△ Less
Submitted 12 June, 2024; v1 submitted 24 May, 2024;
originally announced May 2024.
-
The Promises and Pitfalls of Using Language Models to Measure Instruction Quality in Education
Authors:
Paiheng Xu,
Jing Liu,
Nathan Jones,
Julie Cohen,
Wei Ai
Abstract:
Assessing instruction quality is a fundamental component of any improvement efforts in the education system. However, traditional manual assessments are expensive, subjective, and heavily dependent on observers' expertise and idiosyncratic factors, preventing teachers from getting timely and frequent feedback. Different from prior research that mostly focuses on low-inference instructional practic…
▽ More
Assessing instruction quality is a fundamental component of any improvement efforts in the education system. However, traditional manual assessments are expensive, subjective, and heavily dependent on observers' expertise and idiosyncratic factors, preventing teachers from getting timely and frequent feedback. Different from prior research that mostly focuses on low-inference instructional practices on a singular basis, this paper presents the first study that leverages Natural Language Processing (NLP) techniques to assess multiple high-inference instructional practices in two distinct educational settings: in-person K-12 classrooms and simulated performance tasks for pre-service teachers. This is also the first study that applies NLP to measure a teaching practice that is widely acknowledged to be particularly effective for students with special needs. We confront two challenges inherent in NLP-based instructional analysis, including noisy and long input data and highly skewed distributions of human ratings. Our results suggest that pretrained Language Models (PLMs) demonstrate performances comparable to the agreement level of human raters for variables that are more discrete and require lower inference, but their efficacy diminishes with more complex teaching practices. Interestingly, using only teachers' utterances as input yields strong results for student-centered variables, alleviating common concerns over the difficulty of collecting and transcribing high-quality student speech data in in-person teaching settings. Our findings highlight both the potential and the limitations of current NLP techniques in the education domain, opening avenues for further exploration.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Efficient Algorithms for Regularized Nonnegative Scale-invariant Low-rank Approximation Models
Authors:
Jeremy E. Cohen,
Valentin Leplat
Abstract:
Regularized nonnegative low-rank approximations such as sparse Nonnegative Matrix Factorization or sparse Nonnegative Tucker Decomposition are an important branch of dimensionality reduction models with enhanced interpretability. However, from a practical perspective, the choice of regularizers and regularization coefficients, as well as the design of efficient algorithms, is challenging because o…
▽ More
Regularized nonnegative low-rank approximations such as sparse Nonnegative Matrix Factorization or sparse Nonnegative Tucker Decomposition are an important branch of dimensionality reduction models with enhanced interpretability. However, from a practical perspective, the choice of regularizers and regularization coefficients, as well as the design of efficient algorithms, is challenging because of the multifactor nature of these models and the lack of theory to back these choices. This paper aims at improving upon these issues. By studying a more general model called the Homogeneous Regularized Scale-Invariant, we prove that the scale-invariance inherent to low-rank approximation models causes an implicit regularization with both unexpected beneficial and detrimental effects. This observation allows to better understand the effect of regularization functions in low-rank approximation models, to guide the choice of the regularization hyperparameters, and to design balancing strategies to enhance the convergence speed of dedicated optimization algorithms. Some of these results were already known but restricted to specific instances of regularized low-rank approximations. We also derive a generic Majorization Minimization algorithm that handles many regularized nonnegative low-rank approximations, with convergence guarantees. We showcase our contributions on sparse Nonnegative Matrix Factorization, ridge-regularized Canonical Polyadic decomposition and sparse Nonnegative Tucker Decomposition.
△ Less
Submitted 8 June, 2024; v1 submitted 27 March, 2024;
originally announced March 2024.
-
An Ozsváth--Szabó-type spectral sequence for links in $S^1\times S^2$
Authors:
Jesse Cohen
Abstract:
We show that there is a spectral sequence with $E^2$-page given by the Khovanov homology of a link in $S^1\times S^2$, as defined by Rozansky in arXiv:1011.1958, which converges to the Hochschild homology of an $A_\infty$-bimodule defined in terms of bordered Floer invariants. We also show that the homology algebras $H_*\mathfrak{h}_n$ of the algebras $\mathfrak{h}_n$ over which these bimodules ar…
▽ More
We show that there is a spectral sequence with $E^2$-page given by the Khovanov homology of a link in $S^1\times S^2$, as defined by Rozansky in arXiv:1011.1958, which converges to the Hochschild homology of an $A_\infty$-bimodule defined in terms of bordered Floer invariants. We also show that the homology algebras $H_*\mathfrak{h}_n$ of the algebras $\mathfrak{h}_n$ over which these bimodules are defined give nontrivial $A_\infty$-deformations of Khovanov's arc algebras $H_n$ for $n>1$.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
General Multipoles and Their Implications for Dark Matter Inference
Authors:
Jacob S. Cohen,
Christopher D. Fassnacht,
Conor M. O'Riordan,
Simona Vegetti
Abstract:
The flux ratios of strongly lensed quasars have previously been used to infer the properties of dark matter. In these analyses it is crucial to separate the effect of the main lensing galaxy and the low-mass dark matter halo population. In this work, we investigate flux-ratio perturbations resulting from general third- and fourth-order multipole perturbations to the main lensing galaxy's mass prof…
▽ More
The flux ratios of strongly lensed quasars have previously been used to infer the properties of dark matter. In these analyses it is crucial to separate the effect of the main lensing galaxy and the low-mass dark matter halo population. In this work, we investigate flux-ratio perturbations resulting from general third- and fourth-order multipole perturbations to the main lensing galaxy's mass profile. We simulate four lens systems, each with a different lensing configuration, without multipoles. The simulated flux ratios are perturbed by 10-40 per cent by a population of low-mass haloes consistent with CDM and, in one case, also a satellite galaxy. This level of perturbation is comparable to the magnitude of flux-ratio anomalies in real data that has been previously analyzed. We then attempt to fit the simulated systems using multipoles instead of low-mass haloes. We find that multipoles with amplitudes of 0.01 or less can produce flux-ratio perturbations in excess of 40 per cent. In all cases, third- or fourth-order multipoles can individually reduce the magnitude of, if not eliminate, flux-ratio anomalies. When both multipole orders are jointly included, all simulated flux ratios can be fit to within the observational uncertainty. Our results indicate that low-mass haloes and multipoles are highly degenerate when modelling quadruply-imaged quasars based just on image positions and flux ratios. In the presence of this degeneracy, flux-ratio anomalies in lensed quasars alone cannot be used to place strong constraints on the properties of dark matter without additional information that can inform our priors.
△ Less
Submitted 15 March, 2024; v1 submitted 13 March, 2024;
originally announced March 2024.
-
Harnessing two-photon dissipation for enhanced quantum measurement and control
Authors:
Antoine Marquet,
Simon Dupouy,
Ulysse Réglade,
Antoine Essig,
Joachim Cohen,
Emanuele Albertinale,
Audrey Bienfait,
Théau Peronnin,
Sébastien Jezouin,
Raphaël Lescanne,
Benjamin Huard
Abstract:
Dissipation engineering offers a powerful tool for quantum technologies. Recently, new superconducting devices have achieved an engineered two-photon dissipation rate exceeding all other relevant timescales. In particular, they have proven most useful in preventing transitions between the logical states $|\pmα\rangle$ of a cat qubit. Here, we present three key applications of strong two-photon dis…
▽ More
Dissipation engineering offers a powerful tool for quantum technologies. Recently, new superconducting devices have achieved an engineered two-photon dissipation rate exceeding all other relevant timescales. In particular, they have proven most useful in preventing transitions between the logical states $|\pmα\rangle$ of a cat qubit. Here, we present three key applications of strong two-photon dissipation for quantum measurement and control, beyond cat qubit stabilization. Firstly, we demonstrate its efficacy in overcoming limitations encountered in Wigner tomography at high photon numbers. Secondly, we showcase its potential for realizing universal gates on cat qubits, exploiting the coherent mapping between cat qubit states and superpositions of 0 and 1 photons. Finally, we harness the transient dynamics of a cat state under two-photon dissipation to prepare squeezed cat states with a squeezing factor exceeding 3.96$\pm$0.07 dB.
△ Less
Submitted 24 September, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
Slot Abstractors: Toward Scalable Abstract Visual Reasoning
Authors:
Shanka Subhra Mondal,
Jonathan D. Cohen,
Taylor W. Webb
Abstract:
Abstract visual reasoning is a characteristically human ability, allowing the identification of relational patterns that are abstracted away from object features, and the systematic generalization of those patterns to unseen problems. Recent work has demonstrated strong systematic generalization in visual reasoning tasks involving multi-object inputs, through the integration of slot-based methods…
▽ More
Abstract visual reasoning is a characteristically human ability, allowing the identification of relational patterns that are abstracted away from object features, and the systematic generalization of those patterns to unseen problems. Recent work has demonstrated strong systematic generalization in visual reasoning tasks involving multi-object inputs, through the integration of slot-based methods used for extracting object-centric representations coupled with strong inductive biases for relational abstraction. However, this approach was limited to problems containing a single rule, and was not scalable to visual reasoning problems containing a large number of objects. Other recent work proposed Abstractors, an extension of Transformers that incorporates strong relational inductive biases, thereby inheriting the Transformer's scalability and multi-head architecture, but it has yet to be demonstrated how this approach might be applied to multi-object visual inputs. Here we combine the strengths of the above approaches and propose Slot Abstractors, an approach to abstract visual reasoning that can be scaled to problems involving a large number of objects and multiple relations among them. The approach displays state-of-the-art performance across four abstract visual reasoning tasks, as well as an abstract reasoning task involving real-world images.
△ Less
Submitted 2 June, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
A Relational Inductive Bias for Dimensional Abstraction in Neural Networks
Authors:
Declan Campbell,
Jonathan D. Cohen
Abstract:
The human cognitive system exhibits remarkable flexibility and generalization capabilities, partly due to its ability to form low-dimensional, compositional representations of the environment. In contrast, standard neural network architectures often struggle with abstract reasoning tasks, overfitting, and requiring extensive data for training. This paper investigates the impact of the relational b…
▽ More
The human cognitive system exhibits remarkable flexibility and generalization capabilities, partly due to its ability to form low-dimensional, compositional representations of the environment. In contrast, standard neural network architectures often struggle with abstract reasoning tasks, overfitting, and requiring extensive data for training. This paper investigates the impact of the relational bottleneck -- a mechanism that focuses processing on relations among inputs -- on the learning of factorized representations conducive to compositional coding and the attendant flexibility of processing. We demonstrate that such a bottleneck not only improves generalization and learning efficiency, but also aligns network performance with human-like behavioral biases. Networks trained with the relational bottleneck developed orthogonal representations of feature dimensions latent in the dataset, reflecting the factorized structure thought to underlie human cognitive flexibility. Moreover, the relational network mimics human biases towards regularity without pre-specified symbolic primitives, suggesting that the bottleneck fosters the emergence of abstract representations that confer flexibility akin to symbols.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
Nemotron-4 15B Technical Report
Authors:
Jupinder Parmar,
Shrimai Prabhumoye,
Joseph Jennings,
Mostofa Patwary,
Sandeep Subramanian,
Dan Su,
Chen Zhu,
Deepak Narayanan,
Aastha Jhunjhunwala,
Ayush Dattagupta,
Vibhu Jawa,
Jiwei Liu,
Ameya Mahabaleshwarkar,
Osvald Nitski,
Annika Brundyn,
James Maki,
Miguel Martinez,
Jiaxuan You,
John Kamalu,
Patrick LeGresley,
Denys Fridman,
Jared Casper,
Ashwath Aithal,
Oleksii Kuchaiev,
Mohammad Shoeybi
, et al. (2 additional authors not shown)
Abstract:
We introduce Nemotron-4 15B, a 15-billion-parameter large multilingual language model trained on 8 trillion text tokens. Nemotron-4 15B demonstrates strong performance when assessed on English, multilingual, and coding tasks: it outperforms all existing similarly-sized open models on 4 out of 7 downstream evaluation areas and achieves competitive performance to the leading open models in the remai…
▽ More
We introduce Nemotron-4 15B, a 15-billion-parameter large multilingual language model trained on 8 trillion text tokens. Nemotron-4 15B demonstrates strong performance when assessed on English, multilingual, and coding tasks: it outperforms all existing similarly-sized open models on 4 out of 7 downstream evaluation areas and achieves competitive performance to the leading open models in the remaining ones. Specifically, Nemotron-4 15B exhibits the best multilingual capabilities of all similarly-sized models, even outperforming models over four times larger and those explicitly specialized for multilingual tasks.
△ Less
Submitted 27 February, 2024; v1 submitted 26 February, 2024;
originally announced February 2024.
-
Optimal control of collective electrotaxis in epithelial monolayers
Authors:
Simon F. Martina-Perez,
Isaac B. Breinyn,
Daniel J. Cohen,
Ruth E. Baker
Abstract:
Epithelial monolayers are some of the best-studied models for collective cell migration due to their abundance in multicellular systems and their tractability. Experimentally, the collective migration of epithelial monolayers can be robustly steered e.g. using electric fields, via a process termed electrotaxis. Theoretically, however, the question of how to design an electric field to achieve a de…
▽ More
Epithelial monolayers are some of the best-studied models for collective cell migration due to their abundance in multicellular systems and their tractability. Experimentally, the collective migration of epithelial monolayers can be robustly steered e.g. using electric fields, via a process termed electrotaxis. Theoretically, however, the question of how to design an electric field to achieve a desired spatiotemporal movement pattern is underexplored. In this work, we construct and calibrate an ordinary differential equation model to predict the average velocity of the centre of mass of a cellular monolayer in response to stimulation with an electric field. We use this model, in conjunction with optimal control theory, to derive physically realistic optimal electric field designs to achieve a variety of aims, including maximising the total distance travelled by the monolayer, maximising the monolayer velocity, and keeping the monolayer velocity constant during stimulation. Together, this work is the first to present a unified framework for optimal control of collective monolayer electrotaxis and provides a blueprint to optimally steer collective migration using other external cues.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Human-Like Geometric Abstraction in Large Pre-trained Neural Networks
Authors:
Declan Campbell,
Sreejan Kumar,
Tyler Giallanza,
Thomas L. Griffiths,
Jonathan D. Cohen
Abstract:
Humans possess a remarkable capacity to recognize and manipulate abstract structure, which is especially apparent in the domain of geometry. Recent research in cognitive science suggests neural networks do not share this capacity, concluding that human geometric abilities come from discrete symbolic structure in human mental representations. However, progress in artificial intelligence (AI) sugges…
▽ More
Humans possess a remarkable capacity to recognize and manipulate abstract structure, which is especially apparent in the domain of geometry. Recent research in cognitive science suggests neural networks do not share this capacity, concluding that human geometric abilities come from discrete symbolic structure in human mental representations. However, progress in artificial intelligence (AI) suggests that neural networks begin to demonstrate more human-like reasoning after scaling up standard architectures in both model size and amount of training data. In this study, we revisit empirical results in cognitive science on geometric visual processing and identify three key biases in geometric visual processing: a sensitivity towards complexity, regularity, and the perception of parts and relations. We test tasks from the literature that probe these biases in humans and find that large pre-trained neural network models used in AI demonstrate more human-like abstract geometric processing.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation
Authors:
Zhihong Chen,
Maya Varma,
Jean-Benoit Delbrouck,
Magdalini Paschali,
Louis Blankemeier,
Dave Van Veen,
Jeya Maria Jose Valanarasu,
Alaa Youssef,
Joseph Paul Cohen,
Eduardo Pontes Reis,
Emily B. Tsai,
Andrew Johnston,
Cameron Olsen,
Tanishq Mathew Abraham,
Sergios Gatidis,
Akshay S. Chaudhari,
Curtis Langlotz
Abstract:
Chest X-rays (CXRs) are the most frequently performed imaging test in clinical practice. Recent advances in the development of vision-language foundation models (FMs) give rise to the possibility of performing automated CXR interpretation, which can assist physicians with clinical decision-making and improve patient outcomes. However, developing FMs that can accurately interpret CXRs is challengin…
▽ More
Chest X-rays (CXRs) are the most frequently performed imaging test in clinical practice. Recent advances in the development of vision-language foundation models (FMs) give rise to the possibility of performing automated CXR interpretation, which can assist physicians with clinical decision-making and improve patient outcomes. However, developing FMs that can accurately interpret CXRs is challenging due to the (1) limited availability of large-scale vision-language datasets in the medical image domain, (2) lack of vision and language encoders that can capture the complexities of medical data, and (3) absence of evaluation frameworks for benchmarking the abilities of FMs on CXR interpretation. In this work, we address these challenges by first introducing \emph{CheXinstruct} - a large-scale instruction-tuning dataset curated from 28 publicly-available datasets. We then present \emph{CheXagent} - an instruction-tuned FM capable of analyzing and summarizing CXRs. To build CheXagent, we design a clinical large language model (LLM) for parsing radiology reports, a vision encoder for representing CXR images, and a network to bridge the vision and language modalities. Finally, we introduce \emph{CheXbench} - a novel benchmark designed to systematically evaluate FMs across 8 clinically-relevant CXR interpretation tasks. Extensive quantitative evaluations and qualitative reviews with five expert radiologists demonstrate that CheXagent outperforms previously-developed general- and medical-domain FMs on CheXbench tasks. Furthermore, in an effort to improve model transparency, we perform a fairness evaluation across factors of sex, race and age to highlight potential performance disparities. Our project is at \url{https://stanford-aimi.github.io/chexagent.html}.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Quantifying cell cycle regulation by tissue crowding
Authors:
Carles Falcó,
Daniel J. Cohen,
José A. Carrillo,
Ruth E. Baker
Abstract:
The spatiotemporal coordination and regulation of cell proliferation is fundamental in many aspects of development and tissue maintenance. Cells have the ability to adapt their division rates in response to mechanical constraints, yet we do not fully understand how cell proliferation regulation impacts cell migration phenomena. Here, we present a minimal continuum model of cell migration with cell…
▽ More
The spatiotemporal coordination and regulation of cell proliferation is fundamental in many aspects of development and tissue maintenance. Cells have the ability to adapt their division rates in response to mechanical constraints, yet we do not fully understand how cell proliferation regulation impacts cell migration phenomena. Here, we present a minimal continuum model of cell migration with cell cycle dynamics, which includes density-dependent effects and hence can account for cell proliferation regulation. By combining minimal mathematical modelling, Bayesian inference, and recent experimental data, we quantify the impact of tissue crowding across different cell cycle stages in epithelial tissue expansion experiments. Our model suggests that cells sense local density and adapt cell cycle progression in response, during G1 and the combined S/G2/M phases, providing an explicit relationship between each cell cycle stage duration and local tissue density, which is consistent with several experimental observations. Finally, we compare our mathematical model predictions to different experiments studying cell cycle regulation and present a quantitative analysis on the impact of density-dependent regulation on cell migration patterns. Our work presents a systematic approach for investigating and analysing cell cycle data, providing mechanistic insights into how individual cells regulate proliferation, based on population-based experimental measurements.
△ Less
Submitted 24 April, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
Identifying Spurious Correlations using Counterfactual Alignment
Authors:
Joseph Paul Cohen,
Louis Blankemeier,
Akshay Chaudhari
Abstract:
Models driven by spurious correlations often yield poor generalization performance. We propose the counterfactual (CF) alignment method to detect and quantify spurious correlations of black box classifiers. Our methodology is based on counterfactual images generated with respect to one classifier being input into other classifiers to see if they also induce changes in the outputs of these classifi…
▽ More
Models driven by spurious correlations often yield poor generalization performance. We propose the counterfactual (CF) alignment method to detect and quantify spurious correlations of black box classifiers. Our methodology is based on counterfactual images generated with respect to one classifier being input into other classifiers to see if they also induce changes in the outputs of these classifiers. The relationship between these responses can be quantified and used to identify specific instances where a spurious correlation exists. This is validated by observing intuitive trends in a face-attribute face-attribute and waterbird classifiers, as well as by fabricating spurious correlations and detecting their presence, both visually and quantitatively. Furthermore, utilizing the CF alignment method, we demonstrate that we can evaluate robust optimization methods (GroupDRO, JTT, and FLAC) by detecting a reduction in spurious correlations.
△ Less
Submitted 1 October, 2024; v1 submitted 1 December, 2023;
originally announced December 2023.
-
Barwise Music Structure Analysis with the Correlation Block-Matching Segmentation Algorithm
Authors:
Axel Marmoret,
Jérémy E. Cohen,
Frédéric Bimbot
Abstract:
Music Structure Analysis (MSA) is a Music Information Retrieval task consisting of representing a song in a simplified, organized manner by breaking it down into sections typically corresponding to ``chorus'', ``verse'', ``solo'', etc. In this work, we extend an MSA algorithm called the Correlation Block-Matching (CBM) algorithm introduced by (Marmoret et al., 2020, 2022b). The CBM algorithm is a…
▽ More
Music Structure Analysis (MSA) is a Music Information Retrieval task consisting of representing a song in a simplified, organized manner by breaking it down into sections typically corresponding to ``chorus'', ``verse'', ``solo'', etc. In this work, we extend an MSA algorithm called the Correlation Block-Matching (CBM) algorithm introduced by (Marmoret et al., 2020, 2022b). The CBM algorithm is a dynamic programming algorithm that segments self-similarity matrices, which are a standard description used in MSA and in numerous other applications. In this work, self-similarity matrices are computed from the feature representation of an audio signal and time is sampled at the bar-scale. This study examines three different standard similarity functions for the computation of self-similarity matrices. Results show that, in optimal conditions, the proposed algorithm achieves a level of performance which is competitive with supervised state-of-the-art methods while only requiring knowledge of bar positions. In addition, the algorithm is made open-source and is highly customizable.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
Foundational Competencies and Responsibilities of a Research Software Engineer
Authors:
Florian Goth,
Renato Alves,
Matthias Braun,
Leyla Jael Castro,
Gerasimos Chourdakis,
Simon Christ,
Jeremy Cohen,
Stephan Druskat,
Fredo Erxleben,
Jean-Noël Grad,
Magnus Hagdorn,
Toby Hodges,
Guido Juckeland,
Dominic Kempf,
Anna-Lena Lamprecht,
Jan Linxweiler,
Frank Löffler,
Michele Martone,
Moritz Schwarzmeier,
Heidi Seibold,
Jan Philipp Thiele,
Harald von Waldow,
Samantha Wittke
Abstract:
The term Research Software Engineer, or RSE, emerged a little over 10 years ago as a way to represent individuals working in the research community but focusing on software development. The term has been widely adopted and there are a number of high-level definitions of what an RSE is. However, the roles of RSEs vary depending on the institutional context they work in. At one end of the spectrum,…
▽ More
The term Research Software Engineer, or RSE, emerged a little over 10 years ago as a way to represent individuals working in the research community but focusing on software development. The term has been widely adopted and there are a number of high-level definitions of what an RSE is. However, the roles of RSEs vary depending on the institutional context they work in. At one end of the spectrum, RSE roles may look similar to a traditional research role. At the other extreme, they resemble that of a software engineer in industry. Most RSE roles inhabit the space between these two extremes. Therefore, providing a straightforward, comprehensive definition of what an RSE does and what experience, skills and competencies are required to become one is challenging. In this community paper we define the broad notion of what an RSE is, explore the different types of work they undertake, and define a list of fundamental competencies as well as values that define the general profile of an RSE. On this basis, we elaborate on the progression of these skills along different dimensions, looking at specific types of RSE roles, proposing recommendations for organisations, and giving examples of future specialisations. An appendix details how existing curricula fit into this framework.
△ Less
Submitted 12 August, 2024; v1 submitted 19 November, 2023;
originally announced November 2023.
-
Multi-scale observation of magnetotail reconnection onset: 2. microscopic dynamics
Authors:
K. J. Genestreti,
C. Farrugia,
S. Lu,
S. K. Vines,
P. H. Reiff,
T. -D. Phan,
D. N. Baker,
T. W. Leonard,
J. L. Burch,
S. T. Bingham,
I. J. Cohen,
J. R. Shuster,
D. J. Gershman,
C. G. Mouikis,
A. T. Rogers,
R. B. Torbert,
K. J. Trattner,
J. M. Webster,
L. -J. Chen,
B. L. Giles,
N. Ahmadi,
R. E. Ergun,
C. T. Russell,
R. J. Strangeway,
R. Nakamura
, et al. (1 additional authors not shown)
Abstract:
We analyze the local dynamics of magnetotail reconnection onset using Magnetospheric Multiscale (MMS) data. In conjunction with MMS, the macroscopic dynamics of this event were captured by a number of other ground and space-based observatories, as is reported in a companion paper. We find that the local dynamics of the onset were characterized by the rapid thinning of the cross-tail current sheet…
▽ More
We analyze the local dynamics of magnetotail reconnection onset using Magnetospheric Multiscale (MMS) data. In conjunction with MMS, the macroscopic dynamics of this event were captured by a number of other ground and space-based observatories, as is reported in a companion paper. We find that the local dynamics of the onset were characterized by the rapid thinning of the cross-tail current sheet below the ion inertial scale, accompanied by the growth of flapping waves and the subsequent onset of electron tearing. Multiple kinetic-scale magnetic islands were detected coincident with the growth of an initially sub-Alfvénic, demagnetized tailward ion exhaust. The onset and rapid enhancement of parallel electron inflow at the exhaust boundary was a remote signature of the intensification of reconnection Earthward of the spacecraft. Two secondary reconnection sites are found embedded within the exhaust from a primary X-line. The primary X-line was designated as such on the basis that (1) while multiple jet reversals were observed in the current sheet, only one reversal of the electron inflow was observed at the high-latitude exhaust boundary, (2) the reconnection electric field was roughly 5 times larger at the primary X-line than the secondary X-lines, and (3) energetic electron fluxes increased and transitioned from anti-field-aligned to isotropic during the primary X-line crossing, indicating a change in magnetic topology. The results are consistent with the idea that a primary X-line mediates the reconnection of lobe magnetic field lines and accelerates electrons more efficiently than its secondary X-line counterparts.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
Multi-scale observation of magnetotail reconnection onset: 1. macroscopic dynamics
Authors:
K. J. Genestreti,
C. Farrugia,
S. Lu,
S. K. Vines,
P. H. Reiff,
T. -D. Phan,
D. N. Baker,
T. W. Leonard,
J. L. Burch,
S. T. Bingham,
I. J. Cohen,
J. R. Shuster,
D. J. Gershman,
C. G. Mouikis,
A. T. Rogers,
R. B. Torbert,
K. J. Trattner,
J. M. Webster,
L. -J. Chen,
B. L. Giles,
N. Ahmadi,
R. E. Ergun,
C. T. Russell,
R. J. Strangeway,
R. Nakamura
Abstract:
We analyze a magnetotail reconnection onset event on 3 July 2017 that was observed under otherwise quiescent magnetospheric conditions by a fortuitous conjunction of six space and ground-based observatories. The study investigates the large-scale coupling of the solar wind - magnetosphere system that precipitated the onset of the magnetotail reconnection, focusing on the processes that thinned and…
▽ More
We analyze a magnetotail reconnection onset event on 3 July 2017 that was observed under otherwise quiescent magnetospheric conditions by a fortuitous conjunction of six space and ground-based observatories. The study investigates the large-scale coupling of the solar wind - magnetosphere system that precipitated the onset of the magnetotail reconnection, focusing on the processes that thinned and stretched the cross-tail current layer in the absence of significant flux loading during a two-hour-long preconditioning phase. It is demonstrated with data in the (1) upstream solar wind, (2) at the low-latitude magnetopause, (3) in the high-latitude polar cap, and (4) in the magnetotail that the typical picture of solar wind-driven current sheet thinning via flux loading does not appear relevant for this particular event. We find that the current sheet thinning was, instead, initiated by a transient solar wind pressure pulse and that the current sheet thinning continued even as the magnetotail and solar wind pressures decreased. We suggest that field line curvature induced scattering (observed by Magnetospheric Multiscale (MMS)) and precipitation (observed by Defense Meteorological Satellite Program (DMSP)) of high-energy thermal protons may have evacuated plasma sheet thermal energy, which may require a thinning of the plasma sheet to preserve pressure equilibrium with the solar wind.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
An Interdisciplinary Outlook on Large Language Models for Scientific Research
Authors:
James Boyko,
Joseph Cohen,
Nathan Fox,
Maria Han Veiga,
Jennifer I-Hsiu Li,
Jing Liu,
Bernardo Modenesi,
Andreas H. Rauch,
Kenneth N. Reid,
Soumi Tribedi,
Anastasia Visheratina,
Xin Xie
Abstract:
In this paper, we describe the capabilities and constraints of Large Language Models (LLMs) within disparate academic disciplines, aiming to delineate their strengths and limitations with precision. We examine how LLMs augment scientific inquiry, offering concrete examples such as accelerating literature review by summarizing vast numbers of publications, enhancing code development through automat…
▽ More
In this paper, we describe the capabilities and constraints of Large Language Models (LLMs) within disparate academic disciplines, aiming to delineate their strengths and limitations with precision. We examine how LLMs augment scientific inquiry, offering concrete examples such as accelerating literature review by summarizing vast numbers of publications, enhancing code development through automated syntax correction, and refining the scientific writing process. Simultaneously, we articulate the challenges LLMs face, including their reliance on extensive and sometimes biased datasets, and the potential ethical dilemmas stemming from their use. Our critical discussion extends to the varying impacts of LLMs across fields, from the natural sciences, where they help model complex biological sequences, to the social sciences, where they can parse large-scale qualitative data. We conclude by offering a nuanced perspective on how LLMs can be both a boon and a boundary to scientific progress.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails
Authors:
Traian Rebedea,
Razvan Dinu,
Makesh Sreedhar,
Christopher Parisien,
Jonathan Cohen
Abstract:
NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems. Guardrails (or rails for short) are a specific way of controlling the output of an LLM, such as not talking about topics considered harmful, following a predefined dialogue path, using a particular language style, and more. There are several mechanisms that allow LLM providers a…
▽ More
NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems. Guardrails (or rails for short) are a specific way of controlling the output of an LLM, such as not talking about topics considered harmful, following a predefined dialogue path, using a particular language style, and more. There are several mechanisms that allow LLM providers and developers to add guardrails that are embedded into a specific model at training, e.g. using model alignment. Differently, using a runtime inspired from dialogue management, NeMo Guardrails allows developers to add programmable rails to LLM applications - these are user-defined, independent of the underlying LLM, and interpretable. Our initial results show that the proposed approach can be used with several LLM providers to develop controllable and safe LLM applications using programmable rails.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Klingen Vectors for Depth Zero Supercuspidals of $GSp(4)$
Authors:
Jonathan Cohen
Abstract:
Let $F$ be a non-archimedean local field of characteristic zero and $(π, V)$ a depth zero, irreducible, supercuspidal representation of $GSp(4, F)$. We calculate the dimensions of the spaces of Klingen-invariant vectors in $V$ of level $\mathfrak{p}^n$ for all $n\geq 0 $.
Let $F$ be a non-archimedean local field of characteristic zero and $(π, V)$ a depth zero, irreducible, supercuspidal representation of $GSp(4, F)$. We calculate the dimensions of the spaces of Klingen-invariant vectors in $V$ of level $\mathfrak{p}^n$ for all $n\geq 0 $.
△ Less
Submitted 14 October, 2023;
originally announced October 2023.
-
Relational Constraints On Neural Networks Reproduce Human Biases towards Abstract Geometric Regularity
Authors:
Declan Campbell,
Sreejan Kumar,
Tyler Giallanza,
Jonathan D. Cohen,
Thomas L. Griffiths
Abstract:
Uniquely among primates, humans possess a remarkable capacity to recognize and manipulate abstract structure in the service of task goals across a broad range of behaviors. One illustration of this is in the visual perception of geometric forms. Studies have shown a uniquely human bias toward geometric regularity, with task performance enhanced for more regular and symmetric forms compared to thei…
▽ More
Uniquely among primates, humans possess a remarkable capacity to recognize and manipulate abstract structure in the service of task goals across a broad range of behaviors. One illustration of this is in the visual perception of geometric forms. Studies have shown a uniquely human bias toward geometric regularity, with task performance enhanced for more regular and symmetric forms compared to their geometrically irregular counterparts. Such studies conclude that this behavior implies the existence of discrete symbolic structure in human mental representations, and that replicating such behavior in neural network architectures will require mechanisms for symbolic processing. In this study, we argue that human biases towards geometric regularity can be reproduced in neural networks, without explicitly providing them with symbolic machinery, by augmenting them with an architectural constraint that enables the system to discover and manipulate relational structure. When trained with the appropriate curriculum, this model exhibits human-like biases towards symmetry and regularity in two distinct tasks involving abstract geometric reasoning. Our findings indicate that neural networks, when equipped with the necessary training objectives and architectural elements, can exhibit human-like regularity biases and generalization. This approach provides insights into the neural mechanisms underlying geometric reasoning and offers an alternative to prevailing symbolic "Language of Thought" models in this domain.
△ Less
Submitted 29 September, 2023;
originally announced September 2023.
-
Stochastic Deep Koopman Model for Quality Propagation Analysis in Multistage Manufacturing Systems
Authors:
Zhiyi Chen,
Harshal Maske,
Huanyi Shui,
Devesh Upadhyay,
Michael Hopka,
Joseph Cohen,
Xingjian Lai,
Xun Huan,
Jun Ni
Abstract:
The modeling of multistage manufacturing systems (MMSs) has attracted increased attention from both academia and industry. Recent advancements in deep learning methods provide an opportunity to accomplish this task with reduced cost and expertise. This study introduces a stochastic deep Koopman (SDK) framework to model the complex behavior of MMSs. Specifically, we present a novel application of K…
▽ More
The modeling of multistage manufacturing systems (MMSs) has attracted increased attention from both academia and industry. Recent advancements in deep learning methods provide an opportunity to accomplish this task with reduced cost and expertise. This study introduces a stochastic deep Koopman (SDK) framework to model the complex behavior of MMSs. Specifically, we present a novel application of Koopman operators to propagate critical quality information extracted by variational autoencoders. Through this framework, we can effectively capture the general nonlinear evolution of product quality using a transferred linear representation, thus enhancing the interpretability of the data-driven model. To evaluate the performance of the SDK framework, we carried out a comparative study on an open-source dataset. The main findings of this paper are as follows. Our results indicate that SDK surpasses other popular data-driven models in accuracy when predicting stagewise product quality within the MMS. Furthermore, the unique linear propagation property in the stochastic latent space of SDK enables traceability for quality evolution throughout the process, thereby facilitating the design of root cause analysis schemes. Notably, the proposed framework requires minimal knowledge of the underlying physics of production lines. It serves as a virtual metrology tool that can be applied to various MMSs, contributing to the ultimate goal of Zero Defect Manufacturing.
△ Less
Submitted 18 September, 2023;
originally announced September 2023.
-
The Relational Bottleneck as an Inductive Bias for Efficient Abstraction
Authors:
Taylor W. Webb,
Steven M. Frankland,
Awni Altabaa,
Simon Segert,
Kamesh Krishnamurthy,
Declan Campbell,
Jacob Russin,
Tyler Giallanza,
Zack Dulberg,
Randall O'Reilly,
John Lafferty,
Jonathan D. Cohen
Abstract:
A central challenge for cognitive science is to explain how abstract concepts are acquired from limited experience. This has often been framed in terms of a dichotomy between connectionist and symbolic cognitive models. Here, we highlight a recently emerging line of work that suggests a novel reconciliation of these approaches, by exploiting an inductive bias that we term the relational bottleneck…
▽ More
A central challenge for cognitive science is to explain how abstract concepts are acquired from limited experience. This has often been framed in terms of a dichotomy between connectionist and symbolic cognitive models. Here, we highlight a recently emerging line of work that suggests a novel reconciliation of these approaches, by exploiting an inductive bias that we term the relational bottleneck. In that approach, neural networks are constrained via their architecture to focus on relations between perceptual inputs, rather than the attributes of individual inputs. We review a family of models that employ this approach to induce abstractions in a data-efficient manner, emphasizing their potential as candidate models for the acquisition of abstract concepts in the human mind and brain.
△ Less
Submitted 1 May, 2024; v1 submitted 12 September, 2023;
originally announced September 2023.
-
Parameter identifiability and model selection for partial differential equation models of cell invasion
Authors:
Yue Liu,
Kevin Suh,
Philip K. Maini,
Daniel J. Cohen,
Ruth E. Baker
Abstract:
When employing mechanistic models to study biological phenomena, practical parameter identifiability is important for making accurate predictions across wide range of unseen scenarios, as well as for understanding the underlying mechanisms. In this work we use a profile likelihood approach to investigate parameter identifiability for four extensions of the Fisher--KPP model, given experimental dat…
▽ More
When employing mechanistic models to study biological phenomena, practical parameter identifiability is important for making accurate predictions across wide range of unseen scenarios, as well as for understanding the underlying mechanisms. In this work we use a profile likelihood approach to investigate parameter identifiability for four extensions of the Fisher--KPP model, given experimental data from a cell invasion assay. We show that more complicated models tend to be less identifiable, with parameter estimates being more sensitive to subtle differences in experimental procedures, and that they require more data to be practically identifiable. As a result, we suggest that parameter identifiability should be considered alongside goodness-of-fit and model complexity as criteria for model selection.
△ Less
Submitted 18 October, 2023; v1 submitted 4 September, 2023;
originally announced September 2023.
-
The QUATRO Application Suite: Quantum Computing for Models of Human Cognition
Authors:
Raghavendra Pradyumna Pothukuchi,
Leon Lufkin,
Yu Jun Shen,
Alejandro Simon,
Rome Thorstenson,
Bernardo Eilert Trevisan,
Michael Tu,
Mudi Yang,
Ben Foxman,
Viswanatha Srinivas Pothukuchi,
Gunnar Epping,
Thi Ha Kyaw,
Bryant J Jongkees,
Yongshan Ding,
Jerome R Busemeyer,
Jonathan D Cohen,
Abhishek Bhattacharjee
Abstract:
Research progress in quantum computing has, thus far, focused on a narrow set of application domains. Expanding the suite of quantum application domains is vital for the discovery of new software toolchains and architectural abstractions. In this work, we unlock a new class of applications ripe for quantum computing research -- computational cognitive modeling. Cognitive models are critical to und…
▽ More
Research progress in quantum computing has, thus far, focused on a narrow set of application domains. Expanding the suite of quantum application domains is vital for the discovery of new software toolchains and architectural abstractions. In this work, we unlock a new class of applications ripe for quantum computing research -- computational cognitive modeling. Cognitive models are critical to understanding and replicating human intelligence. Our work connects computational cognitive models to quantum computer architectures for the first time. We release QUATRO, a collection of quantum computing applications from cognitive models. The development and execution of QUATRO shed light on gaps in the quantum computing stack that need to be closed to ease programming and drive performance. Among several contributions, we propose and study ideas pertaining to quantum cloud scheduling (using data from gate- and annealing-based quantum computers), parallelization, and more. In the long run, we expect our research to lay the groundwork for more versatile quantum computer systems in the future.
△ Less
Submitted 8 December, 2023; v1 submitted 1 September, 2023;
originally announced September 2023.
-
Reimagining Heliophysics: A bold new vision for the next decade and beyond
Authors:
Ian J. Cohen,
Dan Baker,
Jacob Bortnik,
Pontus Brandt,
Jim Burch,
Amir Caspi,
George Clark,
Ofer Cohen,
Craig DeForest,
Gordon Emslie,
Matina Gkioulidou,
Alexa Halford,
Aleida Higginson,
Allison Jaynes,
Kristopher Klein,
Craig Kletzing,
Ryan McGranaghan,
David Miles,
Romina Nikoukar,
Katariina Nykyrii,
Larry Paxton,
Louise Prockter,
Harlan Spence,
William H. Swartz,
Drew L. Turner
, et al. (3 additional authors not shown)
Abstract:
The field of Heliophysics has a branding problem. We need an answer to the question: ``What is Heliophysics\?'', the answer to which should clearly and succinctly defines our science in a compelling way that simultaneously introduces a sense of wonder and exploration into our science and our missions. Unfortunately, recent over-reliance on space weather to define our field, as opposed to simply us…
▽ More
The field of Heliophysics has a branding problem. We need an answer to the question: ``What is Heliophysics\?'', the answer to which should clearly and succinctly defines our science in a compelling way that simultaneously introduces a sense of wonder and exploration into our science and our missions. Unfortunately, recent over-reliance on space weather to define our field, as opposed to simply using it as a practical and relatable example of applied Heliophysics science, narrows the scope of what solar and space physics is and diminishes its fundamental importance. Moving forward, our community needs to be bold and unabashed in our definition of Heliophysics and its big questions. We should emphasize the general and fundamental importance and excitement of our science with a new mindset that generalizes and expands the definition of Heliophysics to include new ``frontiers'' of increasing interest to the community. Heliophysics should be unbound from its current confinement to the Sun-Earth connection and expanded to studies of the fundamental nature of space plasma physics across the solar system and greater cosmos. Finally, we need to come together as a community to advance our science by envisioning, prioritizing, and supporting -- with a unified voice -- a set of bold new missions that target compelling science questions - even if they do not explore the traditional Sun- and Earth-centric aspects of Heliophysics science. Such new, large missions to expand the frontiers and scope of Heliophysics science large missions can be the key to galvanizing the public and policymakers to support the overall Heliophysics program.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
The case for studying other planetary magnetospheres and atmospheres in Heliophysics
Authors:
Ian J. Cohen,
Chris Arridge,
Abigail Azari,
Chris Bard,
George Clark,
Frank Crary,
Shannon Curry,
Peter Delamere,
Ryan M. Dewey,
Gina A. DiBraccio,
Chuanfei Dong,
Alexander Drozdov,
Austin Egert,
Rachael Filwett,
Jasper Halekas,
Alexa Halford,
Andréa Hughes,
Katherine Garcia-Sage,
Matina Gkioulidou,
Charlotte Goetz,
Cesare Grava,
Michael Hirsch,
Hans Leo F. Huybrighs,
Peter Kollmann,
Laurent Lamy
, et al. (15 additional authors not shown)
Abstract:
Heliophysics is the field that "studies the nature of the Sun, and how it influences the very nature of space - and, in turn, the atmospheres of planetary bodies and the technology that exists there." However, NASA's Heliophysics Division tends to limit study of planetary magnetospheres and atmospheres to only those of Earth. This leaves exploration and understanding of space plasma physics at oth…
▽ More
Heliophysics is the field that "studies the nature of the Sun, and how it influences the very nature of space - and, in turn, the atmospheres of planetary bodies and the technology that exists there." However, NASA's Heliophysics Division tends to limit study of planetary magnetospheres and atmospheres to only those of Earth. This leaves exploration and understanding of space plasma physics at other worlds to the purview of the Planetary Science and Astrophysics Divisions. This is detrimental to the study of space plasma physics in general since, although some cross-divisional funding opportunities do exist, vital elements of space plasma physics can be best addressed by extending the expertise of Heliophysics scientists to other stellar and planetary magnetospheres. However, the diverse worlds within the solar system provide crucial environmental conditions that are not replicated at Earth but can provide deep insight into fundamental space plasma physics processes. Studying planetary systems with Heliophysics objectives, comprehensive instrumentation, and new grant opportunities for analysis and modeling would enable a novel understanding of fundamental and universal processes of space plasma physics. As such, the Heliophysics community should be prepared to consider, prioritize, and fund dedicated Heliophysics efforts to planetary targets to specifically study space physics and aeronomy objectives.
△ Less
Submitted 24 August, 2023; v1 submitted 22 August, 2023;
originally announced August 2023.
-
A Quantitative Approach to Predicting Representational Learning and Performance in Neural Networks
Authors:
Ryan Pyle,
Sebastian Musslick,
Jonathan D. Cohen,
Ankit B. Patel
Abstract:
A key property of neural networks (both biological and artificial) is how they learn to represent and manipulate input information in order to solve a task. Different types of representations may be suited to different types of tasks, making identifying and understanding learned representations a critical part of understanding and designing useful networks. In this paper, we introduce a new pseudo…
▽ More
A key property of neural networks (both biological and artificial) is how they learn to represent and manipulate input information in order to solve a task. Different types of representations may be suited to different types of tasks, making identifying and understanding learned representations a critical part of understanding and designing useful networks. In this paper, we introduce a new pseudo-kernel based tool for analyzing and predicting learned representations, based only on the initial conditions of the network and the training curriculum. We validate the method on a simple test case, before demonstrating its use on a question about the effects of representational learning on sequential single versus concurrent multitask performance. We show that our method can be used to predict the effects of the scale of weight initialization and training curriculum on representational learning and downstream concurrent multitasking performance.
△ Less
Submitted 14 July, 2023;
originally announced July 2023.
-
Autoparametric resonance extending the bit-flip time of a cat qubit up to 0.3 s
Authors:
Antoine Marquet,
Antoine Essig,
Joachim Cohen,
Nathanaël Cottet,
Anil Murani,
Emanuele Albertinale,
Simon Dupouy,
Audrey Bienfait,
Théau Peronnin,
Sébastien Jezouin,
Raphaël Lescanne,
Benjamin Huard
Abstract:
Cat qubits, for which logical $|0\rangle$ and $|1\rangle$ are coherent states $|\pmα\rangle$ of a harmonic mode, offer a promising route towards quantum error correction. Using dissipation to our advantage so that photon pairs of the harmonic mode are exchanged with single photons of its environment, it is possible to stabilize the logical states and exponentially increase the bit-flip time of the…
▽ More
Cat qubits, for which logical $|0\rangle$ and $|1\rangle$ are coherent states $|\pmα\rangle$ of a harmonic mode, offer a promising route towards quantum error correction. Using dissipation to our advantage so that photon pairs of the harmonic mode are exchanged with single photons of its environment, it is possible to stabilize the logical states and exponentially increase the bit-flip time of the cat qubit with the photon number $|α|^2$. Large two-photon dissipation rate $κ_2$ ensures fast qubit manipulation and short error correction cycles, which are instrumental to correct the remaining phase-flip errors in a repetition code of cat qubits. Here we introduce and operate an autoparametric superconducting circuit that couples a mode containing the cat qubit to a lossy mode whose frequency is set at twice that of the cat mode. This passive coupling does not require a parametric pump and reaches a rate $κ_2/2π\approx 2~\mathrm{MHz}$. With such a strong two-photon dissipation, bit-flip errors of the autoparametric cat qubit are prevented for a characteristic time up to 0.3~s with only a mild impact on phase-flip errors. Besides, we illustrate how the phase of a quantum superposition between $|α\rangle$ and $|-α\rangle$ can be arbitrarily changed by driving the harmonic mode while keeping the engineered dissipation active.
△ Less
Submitted 28 April, 2024; v1 submitted 13 July, 2023;
originally announced July 2023.
-
Quantum control of a cat-qubit with bit-flip times exceeding ten seconds
Authors:
Ulysse Réglade,
Adrien Bocquet,
Ronan Gautier,
Joachim Cohen,
Antoine Marquet,
Emanuele Albertinale,
Natalia Pankratova,
Mattis Hallén,
Felix Rautschke,
Lev-Arcady Sellem,
Pierre Rouchon,
Alain Sarlette,
Mazyar Mirrahimi,
Philippe Campagne-Ibarcq,
Raphaël Lescanne,
Sébastien Jezouin,
Zaki Leghtas
Abstract:
Quantum bits (qubits) are prone to several types of errors due to uncontrolled interactions with their environment. Common strategies to correct these errors are based on architectures of qubits involving daunting hardware overheads. A hopeful path forward is to build qubits that are inherently protected against certain types of errors, so that the overhead required to correct remaining ones is si…
▽ More
Quantum bits (qubits) are prone to several types of errors due to uncontrolled interactions with their environment. Common strategies to correct these errors are based on architectures of qubits involving daunting hardware overheads. A hopeful path forward is to build qubits that are inherently protected against certain types of errors, so that the overhead required to correct remaining ones is significantly reduced. However, the foreseen benefit rests on a severe condition: quantum manipulations of the qubit must not break the protection that has been so carefully engineered. A recent qubit - the cat-qubit - is encoded in the manifold of metastable states of a quantum dynamical system, thereby acquiring continuous and autonomous protection against bit-flips. Here, in a superconducting circuit experiment, we implement a cat-qubit with bit-flip times exceeding 10 seconds. This is a four order of magnitude improvement over previous cat-qubit implementations. We prepare and image quantum superposition states, and measure phase-flip times above 490 nanoseconds. Most importantly, we control the phase of these quantum superpositions without breaking bit-flip protection. This experiment demonstrates the compatibility of quantum control and inherent bit-flip protection at an unprecedented level, showing the viability of these dynamical qubits for future quantum technologies.
△ Less
Submitted 31 May, 2024; v1 submitted 13 July, 2023;
originally announced July 2023.
-
Advanced methods for analyzing in-situ observations of magnetic reconnection
Authors:
H. Hasegawa,
M. R. Argall,
N. Aunai,
R. Bandyopadhyay,
N. Bessho,
I. J. Cohen,
R. E. Denton,
J. C. Dorelli,
J. Egedal,
S. A. Fuselier,
P. Garnier,
V. Genot,
D. B. Graham,
K. J. Hwang,
Y. V. Khotyaintsev,
D. B. Korovinskiy,
B. Lavraud,
Q. Lenouvel,
T. C. Li,
Y. -H. Liu,
B. Michotte de Welle,
T. K. M. Nakamura,
D. S. Payne,
S. M. Petrinec,
Y. Qi
, et al. (11 additional authors not shown)
Abstract:
There is ample evidence for magnetic reconnection in the solar system, but it is a nontrivial task to visualize, to determine the proper approaches and frames to study, and in turn to elucidate the physical processes at work in reconnection regions from in-situ measurements of plasma particles and electromagnetic fields. Here an overview is given of a variety of single- and multi-spacecraft data a…
▽ More
There is ample evidence for magnetic reconnection in the solar system, but it is a nontrivial task to visualize, to determine the proper approaches and frames to study, and in turn to elucidate the physical processes at work in reconnection regions from in-situ measurements of plasma particles and electromagnetic fields. Here an overview is given of a variety of single- and multi-spacecraft data analysis techniques that are key to revealing the context of in-situ observations of magnetic reconnection in space and for detecting and analyzing the diffusion regions where ions and/or electrons are demagnetized. We focus on recent advances in the era of the Magnetospheric Multiscale mission, which has made electron-scale, multi-point measurements of magnetic reconnection in and around Earth's magnetosphere.
△ Less
Submitted 24 June, 2024; v1 submitted 11 July, 2023;
originally announced July 2023.
-
Particle acceleration by magnetic reconnection in geospace
Authors:
Mitsuo Oka,
Joachim Birn,
Jan Egedal,
Fan Guo,
Robert E. Ergun,
Drew L. Turner,
Yuri Khotyaintsev,
Kyoung-Joo Hwang,
Ian J. Cohen,
James F. Drake
Abstract:
Particles are accelerated to very high, non-thermal energies during explosive energy-release phenomena in space, solar, and astrophysical plasma environments. While it has been established that magnetic reconnection plays an important role in the dynamics of Earth's magnetosphere, it remains unclear how magnetic reconnection can further explain particle acceleration to non-thermal energies. Here w…
▽ More
Particles are accelerated to very high, non-thermal energies during explosive energy-release phenomena in space, solar, and astrophysical plasma environments. While it has been established that magnetic reconnection plays an important role in the dynamics of Earth's magnetosphere, it remains unclear how magnetic reconnection can further explain particle acceleration to non-thermal energies. Here we review recent progress in our understanding of particle acceleration by magnetic reconnection in Earth's magnetosphere. With improved resolutions, recent spacecraft missions have enabled detailed studies of particle acceleration at various structures such as the diffusion region, separatrix, jets, magnetic islands (flux ropes), and dipolarization front. With the guiding-center approximation of particle motion, many studies have discussed the relative importance of the parallel electric field as well as the Fermi and betatron effects. However, in order to fully understand the particle acceleration mechanism and further compare with particle acceleration in solar and astrophysical plasma environments, there is a need for further investigation of, for example, energy partition and the precise role of turbulence.
△ Less
Submitted 21 July, 2023; v1 submitted 3 July, 2023;
originally announced July 2023.
-
Multi-point Assessment of the Kinematics of Shocks (MAKOS): A Heliophysics Mission Concept Study
Authors:
Katherine A. Goodrich,
Lynn B. Wilson III,
Steven Schwartz,
Ian J. Cohen,
Drew L. Turner,
Phyllis Whittlesey,
Amir Caspi,
Randall Rose,
Keith Smith
Abstract:
Collisionless shocks are fundamental processes that are ubiquitous in space plasma physics throughout the Heliosphere and most astrophysical environments. Earth's bow shock and interplanetary shocks at 1 AU offer the most readily accessible opportunities to advance our understanding of the nature of collisionless shocks via fully-instrumented, in situ observations. One major outstanding question p…
▽ More
Collisionless shocks are fundamental processes that are ubiquitous in space plasma physics throughout the Heliosphere and most astrophysical environments. Earth's bow shock and interplanetary shocks at 1 AU offer the most readily accessible opportunities to advance our understanding of the nature of collisionless shocks via fully-instrumented, in situ observations. One major outstanding question pertains to the energy budget of collisionless shocks, particularly how exactly collisionless shocks convert incident kinetic bulk flow energy into thermalization (heating), suprathermal particle acceleration, and a variety of plasma waves, including nonlinear structures. Furthermore, it remains unknown how those energy conversion processes change for different shock orientations (e.g., quasi-parallel vs. quasi-perpendicular) and driving conditions (upstream Alfvénic and fast Mach numbers, plasma beta, etc.). Required to address these questions are multipoint observations enabling direct measurement of the necessary plasmas, energetic particles, and electric and magnetic fields and waves, all simultaneously from upstream, downstream, and at the shock transition layer with observatory separations at ion to magnetohydrodynamic (MHD) scales. Such a configuration of spacecraft with specifically-designed instruments has never been available, and this white paper describes a conceptual mission design -- MAKOS -- to address these outstanding questions and advance our knowledge of the nature of collisionless shocks.
△ Less
Submitted 8 June, 2023;
originally announced June 2023.
-
Systematic Visual Reasoning through Object-Centric Relational Abstraction
Authors:
Taylor W. Webb,
Shanka Subhra Mondal,
Jonathan D. Cohen
Abstract:
Human visual reasoning is characterized by an ability to identify abstract patterns from only a small number of examples, and to systematically generalize those patterns to novel inputs. This capacity depends in large part on our ability to represent complex visual inputs in terms of both objects and relations. Recent work in computer vision has introduced models with the capacity to extract objec…
▽ More
Human visual reasoning is characterized by an ability to identify abstract patterns from only a small number of examples, and to systematically generalize those patterns to novel inputs. This capacity depends in large part on our ability to represent complex visual inputs in terms of both objects and relations. Recent work in computer vision has introduced models with the capacity to extract object-centric representations, leading to the ability to process multi-object visual inputs, but falling short of the systematic generalization displayed by human reasoning. Other recent models have employed inductive biases for relational abstraction to achieve systematic generalization of learned abstract rules, but have generally assumed the presence of object-focused inputs. Here, we combine these two approaches, introducing Object-Centric Relational Abstraction (OCRA), a model that extracts explicit representations of both objects and abstract relations, and achieves strong systematic generalization in tasks (including a novel dataset, CLEVR-ART, with greater visual complexity) involving complex visual displays.
△ Less
Submitted 10 November, 2023; v1 submitted 4 June, 2023;
originally announced June 2023.
-
Determinantal Point Process Attention Over Grid Cell Code Supports Out of Distribution Generalization
Authors:
Shanka Subhra Mondal,
Steven Frankland,
Taylor Webb,
Jonathan D. Cohen
Abstract:
Deep neural networks have made tremendous gains in emulating human-like intelligence, and have been used increasingly as ways of understanding how the brain may solve the complex computational problems on which this relies. However, these still fall short of, and therefore fail to provide insight into how the brain supports strong forms of generalization of which humans are capable. One such case…
▽ More
Deep neural networks have made tremendous gains in emulating human-like intelligence, and have been used increasingly as ways of understanding how the brain may solve the complex computational problems on which this relies. However, these still fall short of, and therefore fail to provide insight into how the brain supports strong forms of generalization of which humans are capable. One such case is out-of-distribution (OOD) generalization-successful performance on test examples that lie outside the distribution of the training set. Here, we identify properties of processing in the brain that may contribute to this ability. We describe a two-part algorithm that draws on specific features of neural computation to achieve OOD generalization, and provide a proof of concept by evaluating performance on two challenging cognitive tasks. First we draw on the fact that the mammalian brain represents metric spaces using grid cell code (e.g., in the entorhinal cortex): abstract representations of relational structure, organized in recurring motifs that cover the representational space. Second, we propose an attentional mechanism that operates over the grid cell code using Determinantal Point Process (DPP), that we call DPP attention (DPP-A) -- a transformation that ensures maximum sparseness in the coverage of that space. We show that a loss function that combines standard task-optimized error with DPP-A can exploit the recurring motifs in the grid cell code, and can be integrated with common architectures to achieve strong OOD generalization performance on analogy and arithmetic tasks. This provides both an interpretation of how the grid cell code in the mammalian brain may contribute to generalization performance, and at the same time a potential means for improving such capabilities in artificial neural networks.
△ Less
Submitted 23 January, 2024; v1 submitted 28 May, 2023;
originally announced May 2023.
-
Generalizations of Bertrand's Postulate to Sums of Any Number of Primes
Authors:
Joel E. Cohen
Abstract:
In 1845, Bertrand conjectured that twice any prime strictly exceeds the next prime. Tchebichef proved Bertrand's postulate in 1850. In 1934, Ishikawa proved a stronger result: the sum of any two consecutive primes strictly exceeds the next prime, except for the only equality $2+3=5$. This observation is a special case of a more general result, perhaps not previously noticed: if $p_n$ denotes the…
▽ More
In 1845, Bertrand conjectured that twice any prime strictly exceeds the next prime. Tchebichef proved Bertrand's postulate in 1850. In 1934, Ishikawa proved a stronger result: the sum of any two consecutive primes strictly exceeds the next prime, except for the only equality $2+3=5$. This observation is a special case of a more general result, perhaps not previously noticed: if $p_n$ denotes the $n$th prime, $n=1, 2, \ldots$, with $p_1=2, p_2=3, \ldots$, and if $c_1, \ldots, c_g$ are nonnegative integers (not necessarily distinct), and $d_1, \ldots, d_h$ are positive integers (not necessarily distinct), and $g>h\ge 1$, then there exists a positive integer $N$ such that $p_{n-c_1}+p_{n-c_2}+\cdots +p_{n-c_g}>p_{n+d_1}+\cdots +p_{n+d_h}$ for all $n\ge N$. We prove this result using only the prime number theorem. For any instance of this result, we sketch a way to find the least possible $N$. We give some numerical results and unanswered questions.
△ Less
Submitted 5 May, 2023;
originally announced May 2023.
-
Beyond Transformers for Function Learning
Authors:
Simon Segert,
Jonathan Cohen
Abstract:
The ability to learn and predict simple functions is a key aspect of human intelligence. Recent works have started to explore this ability using transformer architectures, however it remains unclear whether this is sufficient to recapitulate the extrapolation abilities of people in this domain. Here, we propose to address this gap by augmenting the transformer architecture with two simple inductiv…
▽ More
The ability to learn and predict simple functions is a key aspect of human intelligence. Recent works have started to explore this ability using transformer architectures, however it remains unclear whether this is sufficient to recapitulate the extrapolation abilities of people in this domain. Here, we propose to address this gap by augmenting the transformer architecture with two simple inductive learning biases, that are directly adapted from recent models of abstract reasoning in cognitive science. The results we report demonstrate that these biases are helpful in the context of large neural network models, as well as shed light on the types of inductive learning biases that may contribute to human abilities in extrapolation.
△ Less
Submitted 19 April, 2023;
originally announced April 2023.
-
An integrated online radioassay data storage and analytics tool for nEXO
Authors:
R. H. M. Tsang,
A. Piepke,
S. Al Kharusi,
E. Angelico,
I. J. Arnquist,
A. Atencio,
I. Badhrees,
J. Bane,
V. Belov,
E. P. Bernard,
A. Bhat,
T. Bhatta,
A. Bolotnikov,
P. A. Breur,
J. P. Brodsky,
E. Brown,
T. Brunner,
E. Caden,
G. F. Cao,
L. Q. Cao,
D. Cesmecioglu,
C. Chambers,
E. Chambers,
B. Chana,
S. A. Charlebois
, et al. (135 additional authors not shown)
Abstract:
Large-scale low-background detectors are increasingly used in rare-event searches as experimental collaborations push for enhanced sensitivity. However, building such detectors, in practice, creates an abundance of radioassay data especially during the conceptual phase of an experiment when hundreds of materials are screened for radiopurity. A tool is needed to manage and make use of the radioassa…
▽ More
Large-scale low-background detectors are increasingly used in rare-event searches as experimental collaborations push for enhanced sensitivity. However, building such detectors, in practice, creates an abundance of radioassay data especially during the conceptual phase of an experiment when hundreds of materials are screened for radiopurity. A tool is needed to manage and make use of the radioassay screening data to quantitatively assess detector design options. We have developed a Materials Database Application for the nEXO experiment to serve this purpose. This paper describes this database, explains how it functions, and discusses how it streamlines the design of the experiment.
△ Less
Submitted 20 June, 2023; v1 submitted 12 April, 2023;
originally announced April 2023.
-
The Effect of Counterfactuals on Reading Chest X-rays
Authors:
Joseph Paul Cohen,
Rupert Brooks,
Sovann En,
Evan Zucker,
Anuj Pareek,
Matthew Lungren,
Akshay Chaudhari
Abstract:
This study evaluates the effect of counterfactual explanations on the interpretation of chest X-rays. We conduct a reader study with two radiologists assessing 240 chest X-ray predictions to rate their confidence that the model's prediction is correct using a 5 point scale. Half of the predictions are false positives. Each prediction is explained twice, once using traditional attribution methods a…
▽ More
This study evaluates the effect of counterfactual explanations on the interpretation of chest X-rays. We conduct a reader study with two radiologists assessing 240 chest X-ray predictions to rate their confidence that the model's prediction is correct using a 5 point scale. Half of the predictions are false positives. Each prediction is explained twice, once using traditional attribution methods and once with a counterfactual explanation. The overall results indicate that counterfactual explanations allow a radiologist to have more confidence in true positive predictions compared to traditional approaches (0.15$\pm$0.95 with p=0.01) with only a small increase in false positive predictions (0.04$\pm$1.06 with p=0.57). We observe the specific prediction tasks of Mass and Atelectasis appear to benefit the most compared to other tasks.
△ Less
Submitted 2 April, 2023;
originally announced April 2023.
-
Abstractors and relational cross-attention: An inductive bias for explicit relational reasoning in Transformers
Authors:
Awni Altabaa,
Taylor Webb,
Jonathan Cohen,
John Lafferty
Abstract:
An extension of Transformers is proposed that enables explicit relational reasoning through a novel module called the Abstractor. At the core of the Abstractor is a variant of attention called relational cross-attention. The approach is motivated by an architectural inductive bias for relational learning that disentangles relational information from object-level features. This enables explicit rel…
▽ More
An extension of Transformers is proposed that enables explicit relational reasoning through a novel module called the Abstractor. At the core of the Abstractor is a variant of attention called relational cross-attention. The approach is motivated by an architectural inductive bias for relational learning that disentangles relational information from object-level features. This enables explicit relational reasoning, supporting abstraction and generalization from limited data. The Abstractor is first evaluated on simple discriminative relational tasks and compared to existing relational architectures. Next, the Abstractor is evaluated on purely relational sequence-to-sequence tasks, where dramatic improvements are seen in sample efficiency compared to standard Transformers. Finally, Abstractors are evaluated on a collection of tasks based on mathematical problem solving, where consistent improvements in performance and sample efficiency are observed.
△ Less
Submitted 12 April, 2024; v1 submitted 31 March, 2023;
originally announced April 2023.