subscribe to arXiv mailings

Material Transport in Protoplanetary Discs with Massive Embedded Planets

Authors: Hannah J. Petrovic, Richard A. Booth, Cathie J. Clarke

Abstract: Vertical gas and dust flows in protoplanetary discs waft material above the midplane region in the presence of a protoplanet. This motion may alter the delivery of dust to the planet and its circumplanetary disc, as well as through a planetary-induced gap region and hence the inner disc chemistry. Here, we investigate the impact of a massive embedded planet on this material transport through the g… ▽ More Vertical gas and dust flows in protoplanetary discs waft material above the midplane region in the presence of a protoplanet. This motion may alter the delivery of dust to the planet and its circumplanetary disc, as well as through a planetary-induced gap region and hence the inner disc chemistry. Here, we investigate the impact of a massive embedded planet on this material transport through the gap region. We use 3D global hydrodynamic simulations run using FARGO3D with gas and dust species to investigate the dust filtration and the origin of material that can make it through the gap. We find small dust particles can pass through the gap as expected from results in 2D, and that this can be considered in two parts - filtering due to the planetary-induced pressure maximum, and filtering due to accretion onto the planet. When gas accretion onto the planet is included, we find that the larger dust grains that cross the gap (i.e. those with $\mathrm{St} \sim 10^{-4}$) originate from regions near the mid-plane. We also find that dust and gas that enter the planet-carved gap region pass through the Hill sphere of the planet, where the temperature is likely to be strongly enhanced compared with the mid-plane regions from which this material originated. Considering the application of our simulations to a Jupiter-mass planet at $\sim 100\ \mathrm{AU}$, this suggests that CO ice is very likely to desorb from grains in the close proximity of the planet, without requiring any fine-tuning of the planet's location with respect to the CO snowline. △ Less

Submitted 24 September, 2024; originally announced September 2024.

Comments: 15 pages, 12 figures; Accepted for publication in MNRAS

arXiv:2408.08896 [pdf, other]

LLMJudge: LLMs for Relevance Judgments

Authors: Hossein A. Rahmani, Emine Yilmaz, Nick Craswell, Bhaskar Mitra, Paul Thomas, Charles L. A. Clarke, Mohammad Aliannejadi, Clemencia Siro, Guglielmo Faggioli

Abstract: The LLMJudge challenge is organized as part of the LLM4Eval workshop at SIGIR 2024. Test collections are essential for evaluating information retrieval (IR) systems. The evaluation and tuning of a search system is largely based on relevance labels, which indicate whether a document is useful for a specific search and user. However, collecting relevance judgments on a large scale is costly and reso… ▽ More The LLMJudge challenge is organized as part of the LLM4Eval workshop at SIGIR 2024. Test collections are essential for evaluating information retrieval (IR) systems. The evaluation and tuning of a search system is largely based on relevance labels, which indicate whether a document is useful for a specific search and user. However, collecting relevance judgments on a large scale is costly and resource-intensive. Consequently, typical experiments rely on third-party labelers who may not always produce accurate annotations. The LLMJudge challenge aims to explore an alternative approach by using LLMs to generate relevance judgments. Recent studies have shown that LLMs can generate reliable relevance judgments for search systems. However, it remains unclear which LLMs can match the accuracy of human labelers, which prompts are most effective, how fine-tuned open-source LLMs compare to closed-source LLMs like GPT-4, whether there are biases in synthetically generated data, and if data leakage affects the quality of generated labels. This challenge will investigate these questions, and the collected data will be released as a package to support automatic relevance judgment research in information retrieval and search. △ Less

Submitted 9 August, 2024; originally announced August 2024.

Comments: LLMJudge Challenge Overview, 3 pages

arXiv:2408.05388 [pdf, other]

Report on the 1st Workshop on Large Language Model for Evaluation in Information Retrieval (LLM4Eval 2024) at SIGIR 2024

Authors: Hossein A. Rahmani, Clemencia Siro, Mohammad Aliannejadi, Nick Craswell, Charles L. A. Clarke, Guglielmo Faggioli, Bhaskar Mitra, Paul Thomas, Emine Yilmaz

Abstract: The first edition of the workshop on Large Language Model for Evaluation in Information Retrieval (LLM4Eval 2024) took place in July 2024, co-located with the ACM SIGIR Conference 2024 in the USA (SIGIR 2024). The aim was to bring information retrieval researchers together around the topic of LLMs for evaluation in information retrieval that gathered attention with the advancement of large languag… ▽ More The first edition of the workshop on Large Language Model for Evaluation in Information Retrieval (LLM4Eval 2024) took place in July 2024, co-located with the ACM SIGIR Conference 2024 in the USA (SIGIR 2024). The aim was to bring information retrieval researchers together around the topic of LLMs for evaluation in information retrieval that gathered attention with the advancement of large language models and generative AI. Given the novelty of the topic, the workshop was focused around multi-sided discussions, namely panels and poster sessions of the accepted proceedings papers. △ Less

Submitted 9 August, 2024; originally announced August 2024.

Comments: LLM4Eval Workshop Report

arXiv:2408.00848 [pdf, other]

Photoevaporation of protoplanetary discs with PLUTO+PRIZMO I. Lower X-ray-driven mass-loss rates due to enhanced cooling

Authors: Andrew D. Sellek, Tommaso Grassi, Giovanni Picogna, Christian Rab, Cathie J. Clarke, Barbara Ercolano

Abstract: Context: Photoevaporation is an important process for protoplanetary disc dispersal but there has so far been a lack of consensus from simulations over the mass-loss rates and the most important part of the high-energy spectrum for driving the wind. Aims: We aim to isolate the origins of these discrepancies through carefully-benchmarked hydrodynamic simulations of X-ray photoevaporation with time-… ▽ More Context: Photoevaporation is an important process for protoplanetary disc dispersal but there has so far been a lack of consensus from simulations over the mass-loss rates and the most important part of the high-energy spectrum for driving the wind. Aims: We aim to isolate the origins of these discrepancies through carefully-benchmarked hydrodynamic simulations of X-ray photoevaporation with time-dependent thermochemistry calculated on the fly. Methods: We conduct hydrodynamic simulations with pluto where the thermochemistry is calculated using prizmo. We explore the contribution of certain key microphysical processes and the impact of using different spectra used previously in literature studies. Results: We find that additional cooling results from the excitation of O by neutral H, which leads to dramatically reduced mass-loss across the disc compared to previous X-ray photoevaporation models, with an integrated rate of 10^-9 Msun/yr. Such rates would allow for longer-lived discs than previously expected from population synthesis. An alternative spectrum with less soft X-ray produces mass-loss rates around a factor of 2-3 times lower. The chemistry is significantly out of equilibrium, with the survival of H2 into the wind aided by advection. This leads to its role as the dominant coolant at 10s au - thus stabilising a larger radial temperature gradient across the wind - as well as providing a possible wind tracer. △ Less

Submitted 1 August, 2024; originally announced August 2024.

Comments: 26 pages, 16 figures, Accepted 31st July 2024 for publication in A&A

arXiv:2407.18078 [pdf, other]

PEFT-U: Parameter-Efficient Fine-Tuning for User Personalization

Authors: Christopher Clarke, Yuzhao Heng, Lingjia Tang, Jason Mars

Abstract: The recent emergence of Large Language Models (LLMs) has heralded a new era of human-AI interaction. These sophisticated models, exemplified by Chat-GPT and its successors, have exhibited remarkable capabilities in language understanding. However, as these LLMs have undergone exponential growth, a crucial dimension that remains understudied is the personalization of these models. Large foundation… ▽ More The recent emergence of Large Language Models (LLMs) has heralded a new era of human-AI interaction. These sophisticated models, exemplified by Chat-GPT and its successors, have exhibited remarkable capabilities in language understanding. However, as these LLMs have undergone exponential growth, a crucial dimension that remains understudied is the personalization of these models. Large foundation models such as GPT-3 etc. focus on creating a universal model that serves a broad range of tasks and users. This approach emphasizes the model's generalization capabilities, treating users as a collective rather than as distinct individuals. While practical for many common applications, this one-size-fits-all approach often fails to address the rich tapestry of human diversity and individual needs. To explore this issue we introduce the PEFT-U Benchmark: a new dataset for building and evaluating NLP models for user personalization. \datasetname{} consists of a series of user-centered tasks containing diverse and individualized expressions where the preferences of users can potentially differ for the same input. Using PEFT-U, we explore the challenge of efficiently personalizing LLMs to accommodate user-specific preferences in the context of diverse user-centered tasks. △ Less

Submitted 25 July, 2024; originally announced July 2024.

arXiv:2406.14626 [pdf, other]

Inner walls or vortices? Crescent-shaped asymmetries in ALMA observations of protoplanetary discs

Authors: Álvaro Ribas, Cathie J. Clarke, Francesco Zagaria

Abstract: Crescent-shaped asymmetries are common in millimetre observations of protoplanetary discs and are usually attributed to vortices or dust overdensities. However, they often appear on a single side of the major axis and roughly symmetric about the minor axis, suggesting a geometric origin. In this work, we interpret such asymmetries as emission from the exposed inner cavity walls of inclined discs a… ▽ More Crescent-shaped asymmetries are common in millimetre observations of protoplanetary discs and are usually attributed to vortices or dust overdensities. However, they often appear on a single side of the major axis and roughly symmetric about the minor axis, suggesting a geometric origin. In this work, we interpret such asymmetries as emission from the exposed inner cavity walls of inclined discs and use them to characterise their vertical extent. Here we focus on the discs around CIDA 9 and RY Tau, first modelling their observations in visibility space with a simple geometric prescription for the walls, and then exploring more detailed radiative transfer models. Accounting for the wall emission yields significantly better residuals than purely axisymmetric models, and we estimate the dust scale height of these systems to be 0.4 au at 37 au for CIDA 9 and 0.2 au at 12 au for RY Tau. Finally, we identify crescent-shaped asymmetries in twelve discs, nine of which have constraints on their orientation - in all cases, the asymmetry appears on the far-side of the disc, lending support to the hypothesis that they are due to their inner rims. Modelling this effect in larger samples of discs will help to build a statistical view of their vertical structure. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: Accepted for publication in MNRAS

arXiv:2406.11627 [pdf, other]

doi 10.3847/1538-4357/ad55c5

Seeing the unseen: a method to detect unresolved rings in protoplanetary disks

Authors: Chiara E. Scardoni, Richard A. Booth, Cathie J. Clarke, Giovanni P. Rosotti, Alvaro Ribas

Abstract: While high resolution ALMA observations reveal a wealth of substructure in protoplanetary discs, they remain incapable of resolving the types of small scale dust structures predicted, for example, by numerical simulations of the streaming instability. In this Letter, we propose a method to find evidence for unresolved, optically thick dusty rings in protoplanetary disks. We demonstrate that, in pr… ▽ More While high resolution ALMA observations reveal a wealth of substructure in protoplanetary discs, they remain incapable of resolving the types of small scale dust structures predicted, for example, by numerical simulations of the streaming instability. In this Letter, we propose a method to find evidence for unresolved, optically thick dusty rings in protoplanetary disks. We demonstrate that, in presence of unresolved rings, the brightness of an inclined disc exhibits a distinctive emission peak at the minor axis. Furthermore, the azimuthal brightness depends on both the geometry of the rings and the dust optical properties; we can therefore use the azimuthal brightness variations to both detect unresolved rings and probe their properties. By analyzing the azimuthal brightness in the test-case of ring-like substructures formed by streaming instability, we show that the resulting peak is likely detectable by ALMA for typical disc parameters. Moreover, we present an analytic model that not only qualitatively but also quantitatively reproduces the peak found in the simulations, validating its applicability to infer the presence of unresolved rings in observations and characterize their optical properties and shape. This will contribute to the identification of disk regions where streaming instability (and thus planet formation) is occurring. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: Accepted by ApJ, 10 pages, 6 figures

arXiv:2406.05952 [pdf, other]

doi 10.1051/0004-6361/202450187

Angular momentum transport via gravitational instability in the Elias 2-27 disc

Authors: Cristiano Longarini, Giuseppe Lodato, Cathie J. Clarke, Jessica Speedie, Teresa Paneque-Carreno, Edoardo Arrigoni, Pietro Curone, Claudia Toci, Cassandra Hall

Abstract: Gravitational instability is thought to be one of the main drivers of angular momentum transport in young protoplanetary discs. The disc around Elias 2-27 offers a unique example of gravitational instability at work. It is young and massive, displaying two prominent spiral arms in dust continuum emission and global non-axisymmetric kinematic signatures in molecular line data. In this work, we used… ▽ More Gravitational instability is thought to be one of the main drivers of angular momentum transport in young protoplanetary discs. The disc around Elias 2-27 offers a unique example of gravitational instability at work. It is young and massive, displaying two prominent spiral arms in dust continuum emission and global non-axisymmetric kinematic signatures in molecular line data. In this work, we used archival ALMA observations of $^{13}$CO line emission to measure the efficiency of angular momentum transport in the Elias 2-27 system through the kinematic signatures generated by gravitational instability, known as 'GI wiggles'. Assuming the angular momentum is transported by the observed spiral structure and leveraging previously-derived dynamical disc mass measurements, the amount of angular momentum transport we found corresponds to an $α-$viscosity of $α=0.038\pm0.018$. This value implies an accretion rate onto the central star of $\log_{10}\dot{M}_\star=-6.99\pm0.17\text{M}_\odot/\text{yr, which}$ reproduces the one observed value of $\log_{10}\dot{M}_{\star,\text{obs}}=-7.2\pm0.5\text{M}_\odot/\text{yr }$ very well. The excellent agreement we have found serves as further proof that gravitational instability is the main driver of angular momentum transport acting in this system. △ Less

Submitted 11 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

Comments: Accepted for publication in A&A letters, 6 pages, 3 images

arXiv:2406.05447 [pdf, other]

The PLATO Mission

Authors: Heike Rauer, Conny Aerts, Juan Cabrera, Magali Deleuil, Anders Erikson, Laurent Gizon, Mariejo Goupil, Ana Heras, Jose Lorenzo-Alvarez, Filippo Marliani, Cesar Martin-Garcia, J. Miguel Mas-Hesse, Laurence O'Rourke, Hugh Osborn, Isabella Pagano, Giampaolo Piotto, Don Pollacco, Roberto Ragazzoni, Gavin Ramsay, Stéphane Udry, Thierry Appourchaux, Willy Benz, Alexis Brandeker, Manuel Güdel, Eduardo Janot-Pacheco , et al. (801 additional authors not shown)

Abstract: PLATO (PLAnetary Transits and Oscillations of stars) is ESA's M3 mission designed to detect and characterise extrasolar planets and perform asteroseismic monitoring of a large number of stars. PLATO will detect small planets (down to <2 R_(Earth)) around bright stars (<11 mag), including terrestrial planets in the habitable zone of solar-like stars. With the complement of radial velocity observati… ▽ More PLATO (PLAnetary Transits and Oscillations of stars) is ESA's M3 mission designed to detect and characterise extrasolar planets and perform asteroseismic monitoring of a large number of stars. PLATO will detect small planets (down to <2 R_(Earth)) around bright stars (<11 mag), including terrestrial planets in the habitable zone of solar-like stars. With the complement of radial velocity observations from the ground, planets will be characterised for their radius, mass, and age with high accuracy (5 %, 10 %, 10 % for an Earth-Sun combination respectively). PLATO will provide us with a large-scale catalogue of well-characterised small planets up to intermediate orbital periods, relevant for a meaningful comparison to planet formation theories and to better understand planet evolution. It will make possible comparative exoplanetology to place our Solar System planets in a broader context. In parallel, PLATO will study (host) stars using asteroseismology, allowing us to determine the stellar properties with high accuracy, substantially enhancing our knowledge of stellar structure and evolution. The payload instrument consists of 26 cameras with 12cm aperture each. For at least four years, the mission will perform high-precision photometric measurements. Here we review the science objectives, present PLATO's target samples and fields, provide an overview of expected core science performance as well as a description of the instrument and the mission profile at the beginning of the serial production of the flight cameras. PLATO is scheduled for a launch date end 2026. This overview therefore provides a summary of the mission to the community in preparation of the upcoming operational phases. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2405.08965 [pdf, other]

MTLLM: LLMs are Meaning-Typed Code Constructs

Authors: Jason Mars, Yiping Kang, Jayanaka L. Dantanarayana, Chandra Irugalbandara, Kugesan Sivasothynathan, Christopher Clarke, Baichuan Li, Lingjia Tang

Abstract: Programming with Generative AI (GenAI) models, which frequently involves using large language models (LLMs) to accomplish specific functionalities, has experienced significant growth in adoption. However, it remains a complex process, as developers often need to manually configure text inputs for LLMs, a practice known as prompt engineering, and subsequently translate the natural language outputs… ▽ More Programming with Generative AI (GenAI) models, which frequently involves using large language models (LLMs) to accomplish specific functionalities, has experienced significant growth in adoption. However, it remains a complex process, as developers often need to manually configure text inputs for LLMs, a practice known as prompt engineering, and subsequently translate the natural language outputs produced by LLMs back into symbolic code representations (values, types, etc.) that the code can understand. Although some infrastructures are proposed to facilitate prompt engineering, these tools are often complex and challenging for developers to adopt. Instead, this paper presents a simplified approach to integrating LLMs into programming through the introduction of an abstraction layer that hides the complexity of gluing traditional programming and LLMs together. Our approach utilizes the semantic richness in existing programs to automatically translate between the traditional programming languages and the natural language understood by LLMs, eliminating developer efforts such as prompt engineering, decreasing the overall complexity. Specifically in this paper, we design three novel code constructs coupled with an automated runtime management system that bridges the gap between traditional symbolic code and LLMs. We present a fully functional and production-grade implementation for our approach and compare it to SOTA LLM software development tools. We present real-world case studies demonstrating the efficacy of our proposed abstraction that seamlessly utilizes LLMs to solve problems in place of potentially complex traditional programming logic. △ Less

Submitted 14 October, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.03832 [pdf, other]

Guylingo: The Republic of Guyana Creole Corpora

Authors: Christopher Clarke, Roland Daynauth, Charlene Wilkinson, Hubert Devonish, Jason Mars

Abstract: While major languages often enjoy substantial attention and resources, the linguistic diversity across the globe encompasses a multitude of smaller, indigenous, and regional languages that lack the same level of computational support. One such region is the Caribbean. While commonly labeled as "English speaking", the ex-British Caribbean region consists of a myriad of Creole languages thriving alo… ▽ More While major languages often enjoy substantial attention and resources, the linguistic diversity across the globe encompasses a multitude of smaller, indigenous, and regional languages that lack the same level of computational support. One such region is the Caribbean. While commonly labeled as "English speaking", the ex-British Caribbean region consists of a myriad of Creole languages thriving alongside English. In this paper, we present Guylingo: a comprehensive corpus designed for advancing NLP research in the domain of Creolese (Guyanese English-lexicon Creole), the most widely spoken language in the culturally rich nation of Guyana. We first outline our framework for gathering and digitizing this diverse corpus, inclusive of colloquial expressions, idioms, and regional variations in a low-resource language. We then demonstrate the challenges of training and evaluating NLP models for machine translation in Creole. Lastly, we discuss the unique opportunities presented by recent NLP advancements for accelerating the formal adoption of Creole languages as official languages in the Caribbean. △ Less

Submitted 2 July, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

Comments: Accepted to NAACL 2024 Main Conference Special Theme Track: Languages of Latin America and The Caribbean

arXiv:2405.02178 [pdf, other]

Assessing and Verifying Task Utility in LLM-Powered Applications

Authors: Negar Arabzadeh, Siqing Huo, Nikhil Mehta, Qinqyun Wu, Chi Wang, Ahmed Awadallah, Charles L. A. Clarke, Julia Kiseleva

Abstract: The rapid development of Large Language Models (LLMs) has led to a surge in applications that facilitate collaboration among multiple agents, assisting humans in their daily tasks. However, a significant gap remains in assessing to what extent LLM-powered applications genuinely enhance user experience and task execution efficiency. This highlights the need to verify utility of LLM-powered applicat… ▽ More The rapid development of Large Language Models (LLMs) has led to a surge in applications that facilitate collaboration among multiple agents, assisting humans in their daily tasks. However, a significant gap remains in assessing to what extent LLM-powered applications genuinely enhance user experience and task execution efficiency. This highlights the need to verify utility of LLM-powered applications, particularly by ensuring alignment between the application's functionality and end-user needs. We introduce AgentEval, a novel framework designed to simplify the utility verification process by automatically proposing a set of criteria tailored to the unique purpose of any given application. This allows for a comprehensive assessment, quantifying the utility of an application against the suggested criteria. We present a comprehensive analysis of the effectiveness and robustness of AgentEval for two open source datasets including Math Problem solving and ALFWorld House-hold related tasks. For reproducibility purposes, we make the data, code and all the logs publicly available at https://bit.ly/3w3yKcS . △ Less

Submitted 12 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

Comments: arXiv admin note: text overlap with arXiv:2402.09015

arXiv:2404.18925 [pdf, other]

doi 10.1093/mnras/stae1094

Searching for planet-driven dust spirals in ALMA visibilities

Authors: Edward T. Stevenson, Álvaro Ribas, Jessica Speedie, Richard A. Booth, Cathie J. Clarke

Abstract: ALMA (Atacama Large Millimetre/submillimetre Array) observations of the thermal emission from protoplanetary disc dust have revealed a wealth of substructures that could evidence embedded planets, but planet-driven spirals, one of the more compelling lines of evidence, remain relatively rare. Existing works have focused on detecting these spirals using methods that operate in image space. Here, we… ▽ More ALMA (Atacama Large Millimetre/submillimetre Array) observations of the thermal emission from protoplanetary disc dust have revealed a wealth of substructures that could evidence embedded planets, but planet-driven spirals, one of the more compelling lines of evidence, remain relatively rare. Existing works have focused on detecting these spirals using methods that operate in image space. Here, we explore the planet detection capabilities of fitting planet-driven spirals to disc observations directly in visibility space. We test our method on synthetic ALMA observations of planet-containing model discs for a range of disc/observational parameters, finding it significantly outperforms image residuals in identifying spirals in these observations and is able to identify spirals in regions of the parameter space in which no gaps are detected. These tests suggest that a visibility-space fitting approach warrants further investigation and may be able to find planet-driven spirals in observations that have not yet been found with existing approaches. We also test our method on six discs in the Taurus molecular cloud observed with ALMA at 1.33 mm, but find no evidence for planet-driven spirals. We find that the minimum planet masses necessary to drive detectable spirals range from $\approx$ 0.03 to 0.5 $M_{\text{Jup}}$ over orbital radii of 10 to 100 au, with planet masses below these thresholds potentially hiding in such disc observations. Conversely, we suggest that planets $\gtrsim$ 0.5 to 1 $M_{\text{Jup}}$ can likely be ruled out over orbital radii of $\approx$ 20 to 60 au on the grounds that we would have detected them if they were present. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: 16 pages, 14 figures, 3 tables. Accepted 2024 April 18 for publication in MNRAS

arXiv:2404.16859 [pdf, other]

Rumour Evaluation with Very Large Language Models

Authors: Dahlia Shehata, Robin Cohen, Charles Clarke

Abstract: Conversational prompt-engineering-based large language models (LLMs) have enabled targeted control over the output creation, enhancing versatility, adaptability and adhoc retrieval. From another perspective, digital misinformation has reached alarming levels. The anonymity, availability and reach of social media offer fertile ground for rumours to propagate. This work proposes to leverage the adva… ▽ More Conversational prompt-engineering-based large language models (LLMs) have enabled targeted control over the output creation, enhancing versatility, adaptability and adhoc retrieval. From another perspective, digital misinformation has reached alarming levels. The anonymity, availability and reach of social media offer fertile ground for rumours to propagate. This work proposes to leverage the advancement of prompting-dependent LLMs to combat misinformation by extending the research efforts of the RumourEval task on its Twitter dataset. To the end, we employ two prompting-based LLM variants (GPT-3.5-turbo and GPT-4) to extend the two RumourEval subtasks: (1) veracity prediction, and (2) stance classification. For veracity prediction, three classifications schemes are experimented per GPT variant. Each scheme is tested in zero-, one- and few-shot settings. Our best results outperform the precedent ones by a substantial margin. For stance classification, prompting-based-approaches show comparable performance to prior results, with no improvement over finetuning methods. Rumour stance subtask is also extended beyond the original setting to allow multiclass classification. All of the generated predictions for both subtasks are equipped with confidence scores determining their trustworthiness degree according to the LLM, and post-hoc justifications for explainability and interpretability purposes. Our primary aim is AI for social good. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.16235 [pdf, other]

Inclusive studies of two- and three-nucleon short-range correlations in $^3$H and $^3$He

Authors: S. Li, S. N. Santiesteban, J. Arrington, R. Cruz-Torres, L. Kurbany, D. Abrams, S. Alsalmi, D. Androic, K. Aniol, T. Averett, C. Ayerbe Gayoso, J. Bane, S. Barcus, J. Barrow, A. Beck, V. Bellini, H. Bhatt, D. Bhetuwal, D. Biswas, D. Bulumulla, A. Camsonne, J. Castellanos, J. Chen, J-P. Chen, D. Chrisman , et al. (91 additional authors not shown)

Abstract: Inclusive electron scattering at carefully chosen kinematics can isolate scattering from short-range correlations (SRCs), produced through hard, short-distance interactions of nucleons in the nucleus. Because the two-nucleon (2N) SRCs arise from the same N-N interaction in all nuclei, the cross section in the SRC-dominated regime is identical up to an overall scaling factor, and the A/2H cross sec… ▽ More Inclusive electron scattering at carefully chosen kinematics can isolate scattering from short-range correlations (SRCs), produced through hard, short-distance interactions of nucleons in the nucleus. Because the two-nucleon (2N) SRCs arise from the same N-N interaction in all nuclei, the cross section in the SRC-dominated regime is identical up to an overall scaling factor, and the A/2H cross section ratio is constant in this region. This scaling behavior has been used to identify SRC dominance and to map out the contribution of SRCs for a wide range of nuclei. We examine this scaling behavior at lower momentum transfers using new data on $^2$H, $^3$H, and $^3$He which show that the scaling region is larger than in heavy nuclei. Based on the improved scaling, especially for $^3$H/$^3$He, we examine the ratios at kinematics where three-nucleon SRCs may play an important role. The data for the largest initial nucleon momenta are consistent with isolation of scattering from 3N-SRCs, and suggest that the very-highest momentum nucleons in $^3$He have a nearly isospin-independent momentum configuration, or a small enhancement of the proton distribution. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.08137 [pdf, other]

Generative Information Retrieval Evaluation

Authors: Marwah Alaofi, Negar Arabzadeh, Charles L. A. Clarke, Mark Sanderson

Abstract: This paper is a draft of a chapter intended to appear in a forthcoming book on generative information retrieval, co-edited by Chirag Shah and Ryen White. In this chapter, we consider generative information retrieval evaluation from two distinct but interrelated perspectives. First, large language models (LLMs) themselves are rapidly becoming tools for evaluation, with current research indicating t… ▽ More This paper is a draft of a chapter intended to appear in a forthcoming book on generative information retrieval, co-edited by Chirag Shah and Ryen White. In this chapter, we consider generative information retrieval evaluation from two distinct but interrelated perspectives. First, large language models (LLMs) themselves are rapidly becoming tools for evaluation, with current research indicating that LLMs may be superior to crowdsource workers and other paid assessors on basic relevance judgement tasks. We review past and ongoing related research, including speculation on the future of shared task initiatives, such as TREC, and a discussion on the continuing need for human assessments. Second, we consider the evaluation of emerging LLM-based generative information retrieval (GenIR) systems, including retrieval augmented generation (RAG) systems. We consider approaches that focus both on the end-to-end evaluation of GenIR systems and on the evaluation of a retrieval component as an element in a RAG system. Going forward, we expect the evaluation of GenIR systems to be at least partially based on LLM-based assessment, creating an apparent circularity, with a system seemingly evaluating its own output. We resolve this apparent circularity in two ways: 1) by viewing LLM-based assessment as a form of "slow search", where a slower IR system is used for evaluation and training of a faster production IR system; and 2) by recognizing a continuing need to ground evaluation in human assessment, even if the characteristics of that human assessment must change. △ Less

Submitted 16 April, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

Comments: Draft of a chapter intended to appear in a forthcoming book on generative information retrieval, co-edited by Chirag Shah and Ryen White

arXiv:2404.04044 [pdf, other]

A Comparison of Methods for Evaluating Generative IR

Authors: Negar Arabzadeh, Charles L. A. Clarke

Abstract: Information retrieval systems increasingly incorporate generative components. For example, in a retrieval augmented generation (RAG) system, a retrieval component might provide a source of ground truth, while a generative component summarizes and augments its responses. In other systems, a large language model (LLM) might directly generate responses without consulting a retrieval component. While… ▽ More Information retrieval systems increasingly incorporate generative components. For example, in a retrieval augmented generation (RAG) system, a retrieval component might provide a source of ground truth, while a generative component summarizes and augments its responses. In other systems, a large language model (LLM) might directly generate responses without consulting a retrieval component. While there are multiple definitions of generative information retrieval (Gen-IR) systems, in this paper we focus on those systems where the system's response is not drawn from a fixed collection of documents or passages. The response to a query may be entirely new text. Since traditional IR evaluation methods break down under this model, we explore various methods that extend traditional offline evaluation approaches to the Gen-IR context. Offline IR evaluation traditionally employs paid human assessors, but increasingly LLMs are replacing human assessment, demonstrating capabilities similar or superior to crowdsourced labels. Given that Gen-IR systems do not generate responses from a fixed set, we assume that methods for Gen-IR evaluation must largely depend on LLM-generated labels. Along with methods based on binary and graded relevance, we explore methods based on explicit subtopics, pairwise preferences, and embeddings. We first validate these methods against human assessments on several TREC Deep Learning Track tasks; we then apply these methods to evaluate the output of several purely generative systems. For each method we consider both its ability to act autonomously, without the need for human labels or other input, and its ability to support human auditing. To trust these methods, we must be assured that their results align with human assessments. In order to do so, evaluation criteria must be transparent, so that outcomes can be audited by human assessors. △ Less

Submitted 9 April, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

arXiv:2403.09780 [pdf, other]

doi 10.3847/1538-3881/ad34ae

Modeling JWST MIRI-MRS Observations of T Cha: Mid-IR Noble Gas Emission Tracing a Dense Disk Wind

Authors: Andrew D. Sellek, Naman S. Bajaj, Ilaria Pascucci, Cathie J. Clarke, Richard Alexander, Chengyan Xie, Giulia Ballabio, Dingshan Deng, Uma Gorti, Andras Gaspar, Jane Morrison

Abstract: [Ne II] 12.81 $μ\mathrm{m}$ emission is a well-used tracer of protoplanetary disk winds due to its blueshifted line profile. MIRI-MRS recently observed T Cha, detecting this line along with lines of [Ne III], [Ar II] and [Ar III], with the [Ne II] and [Ne III] lines found to be extended while the [Ar II] was not. In this complementary work, we use these lines to address long-debated questions abou… ▽ More [Ne II] 12.81 $μ\mathrm{m}$ emission is a well-used tracer of protoplanetary disk winds due to its blueshifted line profile. MIRI-MRS recently observed T Cha, detecting this line along with lines of [Ne III], [Ar II] and [Ar III], with the [Ne II] and [Ne III] lines found to be extended while the [Ar II] was not. In this complementary work, we use these lines to address long-debated questions about protoplanetary disk winds regarding their mass-loss rate, the origin of their ionization, and the role of magnetically-driven winds as opposed to photoevaporation. To this end, we perform photoionization radiative transfer on simple hydrodynamic wind models to map the line emission. We compare the integrated model luminosities to those observed with MIRI-MRS to identify which models most closely reproduce the data and produce synthetic images from these to understand what information is captured by measurements of the line extents. Along with the low degree of ionization implied by the line ratios, the relative compactness of [Ar II] compared to [Ne II] is particularly constraining. This requires Ne II production by hard X-rays and Ar II production by soft X-rays (and/or EUV) in an extended ($\gtrsim 10$ au) wind that is shielded from soft X-rays - necessitating a dense wind with material launched on scales down to ~1 au. Such conditions could be produced by photoevaporation, whereas an extended MHD wind producing equal shielding would likely underpredict the line fluxes. However, a tenuous inner MHD wind may still contribute to shielding the extended wind. This picture is consistent with constraints from spectrally-resolved line profiles. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: 32 pages, 16 figures, Accepted 14/03/24 to the Astronomical Journal. Complementary modeling to Bajaj et al. 2024 (arXiv:2403.01060)

arXiv:2403.01060 [pdf, other]

JWST MIRI/MRS Observations of T Cha: Discovery of a Spatially Resolved Disk Wind

Authors: Naman S. Bajaj, Ilaria Pascucci, Uma Gorti, Richard Alexander, Andrew Sellek, Jane Morrison, Andras Gaspar, Cathie Clarke, Chengyan Xie, Giulia Ballabio, Dingshan Deng

Abstract: Understanding when and how circumstellar disks disperse is crucial to constrain planet formation and migration. Thermal winds powered by high-energy stellar photons have long been theorized to drive disk dispersal. However, evidence for these winds is currently based only on small (~3-6 km/s) blue-shifts in [Ne II] 12.81 um lines, which does not exclude MHD winds. We report JWST MIRI MRS spectro-i… ▽ More Understanding when and how circumstellar disks disperse is crucial to constrain planet formation and migration. Thermal winds powered by high-energy stellar photons have long been theorized to drive disk dispersal. However, evidence for these winds is currently based only on small (~3-6 km/s) blue-shifts in [Ne II] 12.81 um lines, which does not exclude MHD winds. We report JWST MIRI MRS spectro-imaging of T Cha, a disk with a large dust gap (~30 au in radius) and blue-shifted [Ne II] emission. We detect four forbidden noble gas lines, [Ar II], [Ar III], [Ne II], and [Ne III], of which [Ar III] is the first detection in any protoplanetary disk. We use line flux ratios to constrain the energy of the ionizing photons and find that Argon is ionized by EUV whereas Neon is most likely ionized by X-rays. After performing continuum and Point Spread Function (PSF) subtraction on the IFU cube, we discover a spatial extension in the [Ne II] emission off the disk continuum emission. This is the first spatially resolved [Ne II] disk wind emission. The mostly ionic spectrum of T Cha, in combination with the extended [Ne II] emission, points to an evolved stage for any inner MHD wind and is consistent with the existence of an outer thermal wind ionized and driven by high-energy stellar photons. This work acts as a pathfinder for future observations aiming at investigating disk dispersal using JWST. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: 20 pages, 8 figures, Accepted for publication in AJ

arXiv:2402.09015 [pdf, other]

Towards better Human-Agent Alignment: Assessing Task Utility in LLM-Powered Applications

Authors: Negar Arabzadeh, Julia Kiseleva, Qingyun Wu, Chi Wang, Ahmed Awadallah, Victor Dibia, Adam Fourney, Charles Clarke

Abstract: The rapid development in the field of Large Language Models (LLMs) has led to a surge in applications that facilitate collaboration among multiple agents to assist humans in their daily tasks. However, a significant gap remains in assessing whether LLM-powered applications genuinely enhance user experience and task execution efficiency. This highlights the pressing need for methods to verify utili… ▽ More The rapid development in the field of Large Language Models (LLMs) has led to a surge in applications that facilitate collaboration among multiple agents to assist humans in their daily tasks. However, a significant gap remains in assessing whether LLM-powered applications genuinely enhance user experience and task execution efficiency. This highlights the pressing need for methods to verify utility of LLM-powered applications, particularly by ensuring alignment between the application's functionality and end-user needs. We introduce AgentEval provides an implementation for the math problems, a novel framework designed to simplify the utility verification process by automatically proposing a set of criteria tailored to the unique purpose of any given application. This allows for a comprehensive assessment, quantifying the utility of an application against the suggested criteria. We present a comprehensive analysis of the robustness of quantifier's work. △ Less

Submitted 22 February, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

arXiv:2401.17543 [pdf, other]

Fréchet Distance for Offline Evaluation of Information Retrieval Systems with Sparse Labels

Authors: Negar Arabzadeh, Charles L. A. Clarke

Abstract: The rapid advancement of natural language processing, information retrieval (IR), computer vision, and other technologies has presented significant challenges in evaluating the performance of these systems. One of the main challenges is the scarcity of human-labeled data, which hinders the fair and accurate assessment of these systems. In this work, we specifically focus on evaluating IR systems w… ▽ More The rapid advancement of natural language processing, information retrieval (IR), computer vision, and other technologies has presented significant challenges in evaluating the performance of these systems. One of the main challenges is the scarcity of human-labeled data, which hinders the fair and accurate assessment of these systems. In this work, we specifically focus on evaluating IR systems with sparse labels, borrowing from recent research on evaluating computer vision tasks. taking inspiration from the success of using Fréchet Inception Distance (FID) in assessing text-to-image generation systems. We propose leveraging the Fréchet Distance to measure the distance between the distributions of relevant judged items and retrieved results. Our experimental results on MS MARCO V1 dataset and TREC Deep Learning Tracks query sets demonstrate the effectiveness of the Fréchet Distance as a metric for evaluating IR systems, particularly in settings where a few labels are available. This approach contributes to the advancement of evaluation methodologies in real-world scenarios such as the assessment of generative IR systems. △ Less

Submitted 18 February, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

arXiv:2401.07123 [pdf, other]

One Agent Too Many: User Perspectives on Approaches to Multi-agent Conversational AI

Authors: Christopher Clarke, Karthik Krishnamurthy, Walter Talamonti, Yiping Kang, Lingjia Tang, Jason Mars

Abstract: Conversational agents have been gaining increasing popularity in recent years. Influenced by the widespread adoption of task-oriented agents such as Apple Siri and Amazon Alexa, these agents are being deployed into various applications to enhance user experience. Although these agents promote "ask me anything" functionality, they are typically built to focus on a single or finite set of expertise.… ▽ More Conversational agents have been gaining increasing popularity in recent years. Influenced by the widespread adoption of task-oriented agents such as Apple Siri and Amazon Alexa, these agents are being deployed into various applications to enhance user experience. Although these agents promote "ask me anything" functionality, they are typically built to focus on a single or finite set of expertise. Given that complex tasks often require more than one expertise, this results in the users needing to learn and adopt multiple agents. One approach to alleviate this is to abstract the orchestration of agents in the background. However, this removes the option of choice and flexibility, potentially harming the ability to complete tasks. In this paper, we explore these different interaction experiences (one agent for all) vs (user choice of agents) for conversational AI. We design prototypes for each, systematically evaluating their ability to facilitate task completion. Through a series of conducted user studies, we show that users have a significant preference for abstracting agent orchestration in both system usability and system performance. Additionally, we demonstrate that this mode of interaction is able to provide quality responses that are rated within 1% of human-selected answers. △ Less

Submitted 13 January, 2024; originally announced January 2024.

arXiv:2401.04842 [pdf, other]

Adapting Standard Retrieval Benchmarks to Evaluate Generated Answers

Authors: Negar Arabzadeh, Amin Bigdeli, Charles L. A. Clarke

Abstract: Large language models can now directly generate answers to many factual questions without referencing external sources. Unfortunately, relatively little attention has been paid to methods for evaluating the quality and correctness of these answers, for comparing the performance of one model to another, or for comparing one prompt to another. In addition, the quality of generated answers are rarely… ▽ More Large language models can now directly generate answers to many factual questions without referencing external sources. Unfortunately, relatively little attention has been paid to methods for evaluating the quality and correctness of these answers, for comparing the performance of one model to another, or for comparing one prompt to another. In addition, the quality of generated answers are rarely directly compared to the quality of retrieved answers. As models evolve and prompts are modified, we have no systematic way to measure improvements without resorting to expensive human judgments. To address this problem we adapt standard retrieval benchmarks to evaluate answers generated by large language models. Inspired by the BERTScore metric for summarization, we explore two approaches. In the first, we base our evaluation on the benchmark relevance judgments. We empirically run experiments on how information retrieval relevance judgments can be utilized as an anchor to evaluating the generated answers. In the second, we compare generated answers to the top results retrieved by a diverse set of retrieval models, ranging from traditional approaches to advanced methods, allowing us to measure improvements without human judgments. In both cases, we measure the similarity between an embedded representation of the generated answer and an embedded representation of a known, or assumed, relevant passage from the retrieval benchmark. △ Less

Submitted 9 January, 2024; originally announced January 2024.

arXiv:2312.08947 [pdf, ps, other]

EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Westerlund 1 and 2 Open Clusters Survey

Authors: M. G. Guarcello, E. Flaccomio, J. F. Albacete-Colombo, V. Almendros-Abad, K. Anastasopoulou, M. Andersen, C. Argiroffi, A. Bayo, E. S. Bartlett, N. Bastian, M. De Becker, W. Best, R. Bonito, A. Borghese, D. Calzetti, R. Castellanos, C. Cecchi-Pestellini, S. Clark, C. J. Clarke, F. Coti Zelati, F. Damiani, J. J. Drake, M. Gennaro, A. Ginsburg, E. K. Grebel , et al. (26 additional authors not shown)

Abstract: Context. With a mass exceeding several 10^4 solar masses and a rich and dense population of massive stars, supermassive young star clusters represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions among stars. Aims. In this paper we present the "Extended Westerlund 1 and 2 Open Clusters Survey" (EWOCS) project, which ai… ▽ More Context. With a mass exceeding several 10^4 solar masses and a rich and dense population of massive stars, supermassive young star clusters represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions among stars. Aims. In this paper we present the "Extended Westerlund 1 and 2 Open Clusters Survey" (EWOCS) project, which aims to investigate the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars. The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun. Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically, the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec. Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation were carried out using the ACIS-Extract software. Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a photon flux threshold of approximately 2x10^-8 photons/cm^2/s. The X-ray sources exhibit a highly concentrated spatial distribution, with 1075 sources located within the central 1 arcminute. We have successfully detected X-ray emissions from 126 out of the 166 known massive stars of the cluster, and we have collected over 71000 photons from the magnetar CXO J164710.20-455217 △ Less

Submitted 15 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: The paper has been accepted for publication by Astronomy and Astrophysics

arXiv:2312.03746 [pdf, ps, other]

Evaluating Large Language Model Creativity from a Literary Perspective

Authors: Murray Shanahan, Catherine Clarke

Abstract: This paper assesses the potential for large language models (LLMs) to serve as assistive tools in the creative writing process, by means of a single, in-depth case study. In the course of the study, we develop interactive and multi-voice prompting strategies that interleave background descriptions (scene setting, plot elements), instructions that guide composition, samples of text in the target st… ▽ More This paper assesses the potential for large language models (LLMs) to serve as assistive tools in the creative writing process, by means of a single, in-depth case study. In the course of the study, we develop interactive and multi-voice prompting strategies that interleave background descriptions (scene setting, plot elements), instructions that guide composition, samples of text in the target style, and critical discussion of the given samples. We qualitatively evaluate the results from a literary critical perspective, as well as from the standpoint of computational creativity (a sub-field of artificial intelligence). Our findings lend support to the view that the sophistication of the results that can be achieved with an LLM mirrors the sophistication of the prompting. △ Less

Submitted 30 November, 2023; originally announced December 2023.

arXiv:2311.08950 [pdf, other]

Observing planetesimal formation under streaming instability in the rings of HD 163296

Authors: Francesco Zagaria, Cathie J. Clarke, Richard A. Booth, Stefano Facchini, Giovanni P. Rosotti

Abstract: We introduce a new technique to determine the gas turbulence and surface density in bright disc rings, under the assumption that dust growth is limited by turbulent fragmentation at the ring centre. We benchmark this prescription in HD 163296, showing that our measurements are consistent with available turbulence upper limits and agree with independent estimates of the gas surface density within a… ▽ More We introduce a new technique to determine the gas turbulence and surface density in bright disc rings, under the assumption that dust growth is limited by turbulent fragmentation at the ring centre. We benchmark this prescription in HD 163296, showing that our measurements are consistent with available turbulence upper limits and agree with independent estimates of the gas surface density within a factor of two. We combine our results with literature measurements of the dust surface density and grain size to determine the dust-to-gas ratio and Stokes number in the 67 au and 100 au rings. Our estimates suggest that particle clumping is taking place under the effect of streaming instability (SI) in the 100 au ring. Even though in the presence of external isotropic turbulence this process might be hindered, we provide evidence that turbulence is non-isotropic in both rings and likely originating from mechanisms (such as ambipolar diffusion) that could ease particle clumping under SI. Finally, we determine the mass accretion rate under the assumption that the disc is in steady state and turbulence regulates angular momentum transport. Our results are in tension with spectroscopic measurements and suggest that other mechanisms might be responsible for accretion, in qualitative agreement with the detection of a magneto-centrifugal wind in this system. Applying our method to larger samples can be used to statistically assess if SI is a viable mechanism to form planetesimals in bright rings. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: 13 pages, 4 figures; accepted for publication on ApJL

arXiv:2310.06215 [pdf, other]

Wakefield Generation in Hydrogen and Lithium Plasmas at FACET-II: Diagnostics and First Beam-Plasma Interaction Results

Authors: D. Storey, C. Zhang, P. San Miguel Claveria, G. J. Cao, E. Adli, L. Alsberg, R. Ariniello, C. Clarke, S. Corde, T. N. Dalichaouch, H. Ekerfelt, C. Emma, E. Gerstmayr, S. Gessner, M. Gilljohann, C. Hast, A. Knetsch, V. Lee, M. Litos, R. Loney, K. A. Marsh, A. Matheron, W. B. Mori, Z. Nie, B. O'Shea , et al. (6 additional authors not shown)

Abstract: Plasma Wakefield Acceleration (PWFA) provides ultrahigh acceleration gradients of 10s of GeV/m, providing a novel path towards efficient, compact, TeV-scale linear colliders and high brightness free electron lasers. Critical to the success of these applications is demonstrating simultaneously high gradient acceleration, high energy transfer efficiency, and preservation of emittance, charge, and en… ▽ More Plasma Wakefield Acceleration (PWFA) provides ultrahigh acceleration gradients of 10s of GeV/m, providing a novel path towards efficient, compact, TeV-scale linear colliders and high brightness free electron lasers. Critical to the success of these applications is demonstrating simultaneously high gradient acceleration, high energy transfer efficiency, and preservation of emittance, charge, and energy spread. Experiments at the FACET-II National User Facility at SLAC National Accelerator Laboratory aim to achieve all of these milestones in a single stage plasma wakefield accelerator, providing a 10 GeV energy gain in a <1 m plasma with high energy transfer efficiency. Such a demonstration depends critically on diagnostics able to measure emittance with mm-mrad accuracy, energy spectra to determine both %-level energy spread and broadband energy gain and loss, incoming longitudinal phase space, and matching dynamics. This paper discusses the experimental setup at FACET-II, including the incoming beam parameters from the FACET-II linac, plasma sources, and diagnostics developed to meet this challenge. Initial progress on the generation of beam ionized wakes in meter-scale hydrogen gas is discussed, as well as commissioning of the plasma sources and diagnostics. △ Less

Submitted 9 October, 2023; originally announced October 2023.

arXiv:2310.05883 [pdf, other]

Generation of meter-scale hydrogen plasmas and efficient, pump-depletion-limited wakefield excitation using 10 GeV electron bunches

Authors: C. Zhang, D. Storey, P. San Miguel Claveria, Z. Nie, K. A. Marsh, M. Hogan, W. B. Mori, E. Adli, W. An, R. Ariniello, G. J. Cao, C. Clarke, S. Corde, T. Dalichaouch, C. E. Doss, C. Emma, H. Ekerfelt, E. Gerstmayr, S. Gessner, C. Hansel, A. Knetsch, V. Lee, F. Li, M. Litos, B. O'Shea , et al. (4 additional authors not shown)

Abstract: High repetition rates and efficient energy transfer to the accelerating beam are important for a future linear collider based on the beam-driven plasma wakefield acceleration scheme (PWFA-LC). This paper reports the first results from the Plasma Wakefield Acceleration Collaboration (E300) that are beginning to address both of these issues using the recently commissioned FACET-II facility at SLAC.… ▽ More High repetition rates and efficient energy transfer to the accelerating beam are important for a future linear collider based on the beam-driven plasma wakefield acceleration scheme (PWFA-LC). This paper reports the first results from the Plasma Wakefield Acceleration Collaboration (E300) that are beginning to address both of these issues using the recently commissioned FACET-II facility at SLAC. We have generated meter-scale hydrogen plasmas using time-structured 10 GeV electron bunches from FACET-II, which hold the promise of dramatically increasing the repetition rate of PWFA by rapidly replenishing the gas between each shot compared to the hitherto used lithium plasmas that operate at 1-10 Hz. Furthermore, we have excited wakes in such plasmas that are suitable for high gradient particle acceleration with high drive-bunch to wake energy transfer efficiency -- a first step in achieving a high overall energy transfer efficiency. We have done this by using time-structured electron drive bunches that typically have one or more ultra-high current (>30 kA) femtosecond spike(s) superimposed on a longer (~0.4 ps) lower current (<10 kA) bunch structure. The first spike effectively field-ionizes the gas and produces a meter-scale (30-160 cm) plasma, whereas the subsequent beam charge creates a wake. The length and amplitude of the wake depends on the longitudinal current profile of the bunch and plasma density. We find that the onset of pump depletion, when some of the drive beam electrons are nearly fully depleted of their energy, occurs for hydrogen pressure >1.5 Torr. We also show that some electrons in the rear of the bunch can gain several GeV energies from the wake. These results are reproduced by particle-in-cell simulations using the QPAD code. At a pressure of ~2 Torr, simulations results and experimental data show that the beam transfers about 60% of its energy to the wake. △ Less

Submitted 9 October, 2023; originally announced October 2023.

arXiv:2309.11392 [pdf, other]

doi 10.1145/3624918.3625336

Retrieving Supporting Evidence for Generative Question Answering

Authors: Siqing Huo, Negar Arabzadeh, Charles L. A. Clarke

Abstract: Current large language models (LLMs) can exhibit near-human levels of performance on many natural language-based tasks, including open-domain question answering. Unfortunately, at this time, they also convincingly hallucinate incorrect answers, so that responses to questions must be verified against external sources before they can be accepted at face value. In this paper, we report two simple exp… ▽ More Current large language models (LLMs) can exhibit near-human levels of performance on many natural language-based tasks, including open-domain question answering. Unfortunately, at this time, they also convincingly hallucinate incorrect answers, so that responses to questions must be verified against external sources before they can be accepted at face value. In this paper, we report two simple experiments to automatically validate generated answers against a corpus. We base our experiments on questions and passages from the MS MARCO (V1) test collection, and a retrieval pipeline consisting of sparse retrieval, dense retrieval and neural rerankers. In the first experiment, we validate the generated answer in its entirety. After presenting a question to an LLM and receiving a generated answer, we query the corpus with the combination of the question + generated answer. We then present the LLM with the combination of the question + generated answer + retrieved answer, prompting it to indicate if the generated answer can be supported by the retrieved answer. In the second experiment, we consider the generated answer at a more granular level, prompting the LLM to extract a list of factual statements from the answer and verifying each statement separately. We query the corpus with each factual statement and then present the LLM with the statement and the corresponding retrieved evidence. The LLM is prompted to indicate if the statement can be supported and make necessary edits using the retrieved material. With an accuracy of over 80%, we find that an LLM is capable of verifying its generated answer when a corpus of supporting material is provided. However, manual assessment of a random sample of questions reveals that incorrect generated answers are missed by this verification process. While this verification process can reduce hallucinations, it can not entirely eliminate them. △ Less

Submitted 20 September, 2023; originally announced September 2023.

Comments: arXiv admin note: text overlap with arXiv:2306.13781

Journal ref: Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region (SIGIR-AP '23), November 26--28, 2023, Beijing, China

arXiv:2307.12935 [pdf, other]

Rule By Example: Harnessing Logical Rules for Explainable Hate Speech Detection

Authors: Christopher Clarke, Matthew Hall, Gaurav Mittal, Ye Yu, Sandra Sajeev, Jason Mars, Mei Chen

Abstract: Classic approaches to content moderation typically apply a rule-based heuristic approach to flag content. While rules are easily customizable and intuitive for humans to interpret, they are inherently fragile and lack the flexibility or robustness needed to moderate the vast amount of undesirable content found online today. Recent advances in deep learning have demonstrated the promise of using hi… ▽ More Classic approaches to content moderation typically apply a rule-based heuristic approach to flag content. While rules are easily customizable and intuitive for humans to interpret, they are inherently fragile and lack the flexibility or robustness needed to moderate the vast amount of undesirable content found online today. Recent advances in deep learning have demonstrated the promise of using highly effective deep neural models to overcome these challenges. However, despite the improved performance, these data-driven models lack transparency and explainability, often leading to mistrust from everyday users and a lack of adoption by many platforms. In this paper, we present Rule By Example (RBE): a novel exemplar-based contrastive learning approach for learning from logical rules for the task of textual content moderation. RBE is capable of providing rule-grounded predictions, allowing for more explainable and customizable predictions compared to typical deep learning-based approaches. We demonstrate that our approach is capable of learning rich rule embedding representations using only a few data examples. Experimental results on 3 popular hate speech classification datasets show that RBE is able to outperform state-of-the-art deep learning classifiers as well as the use of rules in both supervised and unsupervised settings while providing explainable model predictions via rule-grounding. △ Less

Submitted 24 July, 2023; originally announced July 2023.

Comments: ACL 2023 Main Conference

arXiv:2307.10798 [pdf, other]

doi 10.1051/0004-6361/202347042

Radio multiwavelength analysis of the compact disk CX Tau: Presence of strong free-free variability or anomalous microwave emission

Authors: Pietro Curone, Leonardo Testi, Enrique Macias, Marco Tazzari, Stefano Facchini, Jonathan P. Williams, Cathie J. Clarke, Antonella Natta, Giovanni Rosotti, Claudia Toci, Giuseppe Lodato

Abstract: Protoplanetary disks emit radiation across a broad range of wavelengths, requiring a multiwavelength approach to fully understand their physical mechanisms and how they form planets. Observations at sub-millimeter to centimeter wavelengths can provide insights into the thermal emission from dust, free-free emission from ionized gas, and possible gyro-synchrotron emission from the stellar magnetosp… ▽ More Protoplanetary disks emit radiation across a broad range of wavelengths, requiring a multiwavelength approach to fully understand their physical mechanisms and how they form planets. Observations at sub-millimeter to centimeter wavelengths can provide insights into the thermal emission from dust, free-free emission from ionized gas, and possible gyro-synchrotron emission from the stellar magnetosphere. This work is focused on CX Tau, a ${\sim}0.4\,M_\odot$ star with an extended gas emission and a compact and apparently structureless dust disk, with an average millimeter flux when compared to Class II sources in Taurus. We present observations from the Karl G. Jansky Very Large Array (VLA) across four bands (between 9.0 mm and 6.0 cm) and combine them with archival data from the Atacama Large Millimeter/submillimeter Array (ALMA), the Submillimeter Array (SMA) and the Plateau de Bure Interferometer (PdBI). This multiwavelength approach allows us to separate the dust continuum from other emissions. After isolating the dust thermal emission, we derived an upper limit of the dust disk extent at 1.3 cm which is consistent with theoretical predictions of a radial drift-dominated disk. Centimeter data show a peculiar behavior: deep observations at 6.0 cm did not detect the source, while at 1.3 cm the flux density is anomalously higher than adjacent bands. Intraband spectral indices suggest a dominant contribution from free-free emission, whereas gyro-synchrotron emission is excluded. To explain these observations, we propose a strong variability among the free-free emission with timescales shorter than a month. Another possible interpretation is the presence of anomalous microwave emission from spinning dust grains. △ Less

Submitted 22 August, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

Comments: Accepted for publication in A&A; 13 pages, 13 figures, 1 table

Journal ref: A&A 677, A118 (2023)

arXiv:2306.13781 [pdf, other]

Retrieving Supporting Evidence for LLMs Generated Answers

Authors: Siqing Huo, Negar Arabzadeh, Charles L. A. Clarke

Abstract: Current large language models (LLMs) can exhibit near-human levels of performance on many natural language tasks, including open-domain question answering. Unfortunately, they also convincingly hallucinate incorrect answers, so that responses to questions must be verified against external sources before they can be accepted at face value. In this paper, we report a simple experiment to automatical… ▽ More Current large language models (LLMs) can exhibit near-human levels of performance on many natural language tasks, including open-domain question answering. Unfortunately, they also convincingly hallucinate incorrect answers, so that responses to questions must be verified against external sources before they can be accepted at face value. In this paper, we report a simple experiment to automatically verify generated answers against a corpus. After presenting a question to an LLM and receiving a generated answer, we query the corpus with the combination of the question + generated answer. We then present the LLM with the combination of the question + generated answer + retrieved answer, prompting it to indicate if the generated answer can be supported by the retrieved answer. We base our experiment on questions and passages from the MS MARCO (V1) test collection, exploring three retrieval approaches ranging from standard BM25 to a full question answering stack, including a reader based on the LLM. For a large fraction of questions, we find that an LLM is capable of verifying its generated answer if appropriate supporting material is provided. However, with an accuracy of 70-80%, this approach cannot be fully relied upon to detect hallucinations. △ Less

Submitted 23 June, 2023; originally announced June 2023.

arXiv:2305.16521 [pdf, other]

Label Agnostic Pre-training for Zero-shot Text Classification

Authors: Christopher Clarke, Yuzhao Heng, Yiping Kang, Krisztian Flautner, Lingjia Tang, Jason Mars

Abstract: Conventional approaches to text classification typically assume the existence of a fixed set of predefined labels to which a given text can be classified. However, in real-world applications, there exists an infinite label space for describing a given text. In addition, depending on the aspect (sentiment, topic, etc.) and domain of the text (finance, legal, etc.), the interpretation of the label c… ▽ More Conventional approaches to text classification typically assume the existence of a fixed set of predefined labels to which a given text can be classified. However, in real-world applications, there exists an infinite label space for describing a given text. In addition, depending on the aspect (sentiment, topic, etc.) and domain of the text (finance, legal, etc.), the interpretation of the label can vary greatly. This makes the task of text classification, particularly in the zero-shot scenario, extremely challenging. In this paper, we investigate the task of zero-shot text classification with the aim of improving the ability of pre-trained language models (PLMs) to generalize to both seen and unseen data across varying aspects and domains. To solve this we introduce two new simple yet effective pre-training strategies, Implicit and Explicit pre-training. These methods inject aspect-level understanding into the model at train time with the goal of conditioning the model to build task-level understanding. To evaluate this, we construct and release UTCD, a new benchmark dataset for evaluating text classification in zero-shot settings. Experimental results on UTCD show that our approach achieves improved zero-shot generalization on a suite of challenging datasets across an array of zero-shot formalizations. △ Less

Submitted 25 May, 2023; originally announced May 2023.

Comments: Findings of ACL 2023

arXiv:2305.14948 [pdf, other]

Music Representing Corpus Virtual: An Open Sourced Library for Explorative Music Generation, Sound Design, and Instrument Creation with Artificial Intelligence and Machine Learning

Authors: Christopher Johann Clarke

Abstract: Music Representing Corpus Virtual (MRCV) is an open source software suite designed to explore the capabilities of Artificial Intelligence (AI) and Machine Learning (ML) in Music Generation, Sound Design, and Virtual Instrument Creation (MGSDIC). The software is accessible to users of varying levels of experience, with an emphasis on providing an explorative approach to MGSDIC. The main aim of MRCV… ▽ More Music Representing Corpus Virtual (MRCV) is an open source software suite designed to explore the capabilities of Artificial Intelligence (AI) and Machine Learning (ML) in Music Generation, Sound Design, and Virtual Instrument Creation (MGSDIC). The software is accessible to users of varying levels of experience, with an emphasis on providing an explorative approach to MGSDIC. The main aim of MRCV is to facilitate creativity, allowing users to customize input datasets for training the neural networks, and offering a range of options for each neural network (thoroughly documented in the Github Wiki). The software suite is designed to be accessible to musicians, audio professionals, sound designers, and composers, regardless of their prior experience in AI or ML. The documentation is prepared in such a way as to abstract technical details, thereby making it easy to understand. The software is open source, meaning users can contribute to its development, and the community can collectively benefit from the insights and experience of other users. △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: 16 pages

arXiv:2305.06984 [pdf, other]

Evaluating Open-Domain Question Answering in the Era of Large Language Models

Authors: Ehsan Kamalloo, Nouha Dziri, Charles L. A. Clarke, Davood Rafiei

Abstract: Lexical matching remains the de facto evaluation method for open-domain question answering (QA). Unfortunately, lexical matching fails completely when a plausible candidate answer does not appear in the list of gold answers, which is increasingly the case as we shift from extractive to generative models. The recent success of large language models (LLMs) for QA aggravates lexical matching failures… ▽ More Lexical matching remains the de facto evaluation method for open-domain question answering (QA). Unfortunately, lexical matching fails completely when a plausible candidate answer does not appear in the list of gold answers, which is increasingly the case as we shift from extractive to generative models. The recent success of large language models (LLMs) for QA aggravates lexical matching failures since candidate answers become longer, thereby making matching with the gold answers even more challenging. Without accurate evaluation, the true progress in open-domain QA remains unknown. In this paper, we conduct a thorough analysis of various open-domain QA models, including LLMs, by manually evaluating their answers on a subset of NQ-open, a popular benchmark. Our assessments reveal that while the true performance of all models is significantly underestimated, the performance of the InstructGPT (zero-shot) LLM increases by nearly +60%, making it on par with existing top models, and the InstructGPT (few-shot) model actually achieves a new state-of-the-art on NQ-open. We also find that more than 50% of lexical matching failures are attributed to semantically equivalent answers. We further demonstrate that regex matching ranks QA models consistent with human judgments, although still suffering from unnecessary strictness. Finally, we demonstrate that automated evaluation models are a reasonable surrogate for lexical matching in some circumstances, but not for long-form answers generated by LLMs. The automated models struggle in detecting hallucinations in LLM answers and are thus unable to evaluate LLMs. At this time, there appears to be no substitute for human evaluation. △ Less

Submitted 6 July, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

Comments: ACL 2023; code and data released at https://github.com/ehsk/OpenQA-eval

arXiv:2304.13770 [pdf, ps, other]

A novel measurement of the neutron magnetic form factor from A=3 mirror nuclei

Authors: S. N. Santiesteban, S. Li, D. Abrams, S. Alsalmi, D. Androic, K. Aniol, J. Arrington, T. Averett, C. Ayerbe Gayoso, J. Bane, S. Barcus, J. Barrow, A. Beck, V. Bellini, H. Bhatt, D. Bhetuwal, D. Biswas, A. Camsonne, J. Castellanos, J. Chen, J-P. Chen, D. Chrisman, M. E. Christy, C. Clarke, S. Covrig , et al. (81 additional authors not shown)

Abstract: The electromagnetic form factors of the proton and neutron encode information on the spatial structure of their charge and magnetization distributions. While measurements of the proton are relatively straightforward, the lack of a free neutron target makes measurements of the neutron's electromagnetic structure more challenging and more sensitive to experimental or model-dependent uncertainties. V… ▽ More The electromagnetic form factors of the proton and neutron encode information on the spatial structure of their charge and magnetization distributions. While measurements of the proton are relatively straightforward, the lack of a free neutron target makes measurements of the neutron's electromagnetic structure more challenging and more sensitive to experimental or model-dependent uncertainties. Various experiments have attempted to extract the neutron form factors from scattering from the neutron in deuterium, with different techniques providing different, and sometimes large, systematic uncertainties. We present results from a novel measurement of the neutron magnetic form factor using quasielastic scattering from the mirror nuclei $^3$H and $^3$He, where the nuclear effects are larger than for deuterium but expected to largely cancel in the cross-section ratios. We extracted values of the neutron magnetic form factor for low-to-modest momentum transfer, $0.6<Q^2<2.9$ GeV$^2$, where existing measurements give inconsistent results. The precision and $Q^2$ range of this data allow for a better understanding of the current world's data, and suggest a path toward further improvement of our overall understanding of the neutron's magnetic form factor. △ Less

Submitted 15 May, 2024; v1 submitted 26 April, 2023; originally announced April 2023.

Journal ref: Phys. Rev. Lett. 132, 162501 (2024)

arXiv:2304.09161 [pdf, other]

doi 10.1145/3578337.3605136

Perspectives on Large Language Models for Relevance Judgment

Authors: Guglielmo Faggioli, Laura Dietz, Charles Clarke, Gianluca Demartini, Matthias Hagen, Claudia Hauff, Noriko Kando, Evangelos Kanoulas, Martin Potthast, Benno Stein, Henning Wachsmuth

Abstract: When asked, large language models (LLMs) like ChatGPT claim that they can assist with relevance judgments but it is not clear whether automated judgments can reliably be used in evaluations of retrieval systems. In this perspectives paper, we discuss possible ways for LLMs to support relevance judgments along with concerns and issues that arise. We devise a human--machine collaboration spectrum th… ▽ More When asked, large language models (LLMs) like ChatGPT claim that they can assist with relevance judgments but it is not clear whether automated judgments can reliably be used in evaluations of retrieval systems. In this perspectives paper, we discuss possible ways for LLMs to support relevance judgments along with concerns and issues that arise. We devise a human--machine collaboration spectrum that allows to categorize different relevance judgment strategies, based on how much humans rely on machines. For the extreme point of "fully automated judgments", we further include a pilot experiment on whether LLM-based relevance judgments correlate with judgments from trained human assessors. We conclude the paper by providing opposing perspectives for and against the use of~LLMs for automatic relevance judgments, and a compromise perspective, informed by our analyses of the literature, our preliminary experimental evidence, and our experience as IR researchers. △ Less

Submitted 18 November, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

ACM Class: H.3.3

arXiv:2304.01760 [pdf, other]

doi 10.1051/0004-6361/202346164

Testing protoplanetary disc evolution with CO fluxes. A proof of concept in Lupus and Upper Sco

Authors: Francesco Zagaria, Stefano Facchini, Anna Miotello, Carlo F. Manara, Claudia Toci, Cathie J. Clarke

Abstract: The Atacama Large Millimeter/submillimeter Array (ALMA) revolutionised our understanding of protoplanetary discs. However, the available data have not given conclusive answers yet on the underlying disc evolution mechanisms (viscosity or MHD winds). Improving upon the current results, mostly based on the analysis of disc sizes, is difficult because larger, deeper and higher angular resolution surv… ▽ More The Atacama Large Millimeter/submillimeter Array (ALMA) revolutionised our understanding of protoplanetary discs. However, the available data have not given conclusive answers yet on the underlying disc evolution mechanisms (viscosity or MHD winds). Improving upon the current results, mostly based on the analysis of disc sizes, is difficult because larger, deeper and higher angular resolution surveys would be required, which could be prohibitive even for ALMA. In this Letter, we introduce an alternative method to study disc evolution based on $^{12}$CO fluxes. In fact, fluxes can be readily collected using less time-consuming, lower resolution observations, while tracing the same disc physico-chemical processes as sizes: assuming that $^{12}$CO is optically thick, fluxes scale with the disc surface area. We developed a semi-analytical model to compute $^{12}$CO fluxes and benchmarked it against the results of DALI thermochemical models, recovering an agreement within a factor of three. As a proof of concept we compared our models with Lupus and Upper Sco data, taking advantage of the increased samples, by a factor 1.3 (Lupus) and 3.6 (Upper Sco), when studying fluxes instead of sizes. Models and data agree well only if CO depletion is considered. However, the uncertainties on the initial conditions limited our interpretation of the observations. Our new method can be used to design future ad hoc observational strategies to collect better data and give conclusive answers on disc evolution. △ Less

Submitted 7 April, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

Comments: 10 pages, 9 figures; accepted for publication in A&A Letters. Code and data available at https://github.com/fzagaria/COpops.git

Journal ref: A&A 672, L15 (2023)

arXiv:2304.01700 [pdf, other]

Acceleration of a Positron Bunch in a Hollow Channel Plasma

Authors: Spencer Gessner, Erik Adli, James M. Allen, Weiming An, Christine I. Clarke, Chris E. Clayton, Sebastien Corde, Antoine Doche, Joel Frederico, Selina Z. Green, Mark J. Hogan, Chan Joshi, Carl A. Lindstrom, Michael Litos, Kenneth A. Marsh, Warren B. Mori, Brendan O'Shea, Navid Vafaei-Najafabadi, Vitaly Yakimenko

Abstract: Plasmas are a compelling medium for particle acceleration owing to their natural ability to sustain electric fields that are orders of magnitude larger than those available in conventional radio-frequency accelerators. Plasmas are also unique amongst accelerator technologies in that they respond differently to beams of opposite charge. The asymmetric response of a plasma to highly-relativistic ele… ▽ More Plasmas are a compelling medium for particle acceleration owing to their natural ability to sustain electric fields that are orders of magnitude larger than those available in conventional radio-frequency accelerators. Plasmas are also unique amongst accelerator technologies in that they respond differently to beams of opposite charge. The asymmetric response of a plasma to highly-relativistic electron and positron beams arises from the fact that plasmas are composed of light, mobile electrons and heavy, stationary ions. Hollow channel plasma acceleration is a technique for symmetrizing the response of the plasma, such that it works equally well for high-energy electron and positron beams. In the experiment described here, we demonstrate the generation of a positron beam-driven wake in an extended, annular plasma channel, and acceleration of a second trailing witness positron bunch by the wake. The leading bunch excites the plasma wakefield and loses energy to the plasma, while the witness bunch experiences an accelerating field and gains energy, thus providing a proof-of-concept for hollow channel acceleration of positron beams. At a bunch separation of 330 um, the accelerating gradient is 70 MV/m, the transformer ratio is 0.55, and the energy transfer efficiency is 18% for a drive-to-witness beam charge ratio of 5:1. △ Less

Submitted 30 December, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

arXiv:2303.05548 [pdf]

doi 10.1038/s41586-023-05852-9

Light Curves and Colors of the Ejecta from Dimorphos after the DART Impact

Authors: Ariel Graykowski, Ryan A. Lambert, Franck Marchis, Dorian Cazeneuve, Paul A. Dalba, Thomas M. Esposito, Daniel O'Conner Peluso, Lauren A. Sgro, Guillaume Blaclard, Antonin Borot, Arnaud Malvache, Laurent Marfisi, Tyler M. Powell, Patrice Huet, Matthieu Limagne, Bruno Payet, Colin Clarke, Susan Murabana, Daniel Chu Owen, Ronald Wasilwa, Keiichi Fukui, Tateki Goto, Bruno Guillet, Patrick Huth, Satoshi Ishiyama , et al. (19 additional authors not shown)

Abstract: On 26 September 2022 the Double Asteroid Redirection Test (DART) spacecraft impacted Dimorphos, a satellite of the asteroid 65803 Didymos. Because it is a binary system, it is possible to determine how much the orbit of the satellite changed, as part of a test of what is necessary to deflect an asteroid that might threaten Earth with an impact. In nominal cases, pre-impact predictions of the orbit… ▽ More On 26 September 2022 the Double Asteroid Redirection Test (DART) spacecraft impacted Dimorphos, a satellite of the asteroid 65803 Didymos. Because it is a binary system, it is possible to determine how much the orbit of the satellite changed, as part of a test of what is necessary to deflect an asteroid that might threaten Earth with an impact. In nominal cases, pre-impact predictions of the orbital period reduction ranged from ~8.8 - 17.2 minutes. Here we report optical observations of Dimorphos before, during and after the impact, from a network of citizen science telescopes across the world. We find a maximum brightening of 2.29 $\pm$ 0.14 mag upon impact. Didymos fades back to its pre-impact brightness over the course of 23.7 $\pm$ 0.7 days. We estimate lower limits on the mass contained in the ejecta, which was 0.3 - 0.5% Dimorphos' mass depending on the dust size. We also observe a reddening of the ejecta upon impact. △ Less

Submitted 9 March, 2023; originally announced March 2023.

Comments: Accepted by Nature

arXiv:2302.11021 [pdf, other]

MVMTnet: A Multi-variate Multi-modal Transformer for Multi-class Classification of Cardiac Irregularities Using ECG Waveforms and Clinical Notes

Authors: Ankur Samanta, Mark Karlov, Meghna Ravikumar, Christian McIntosh Clarke, Jayakumar Rajadas, Kaveh Hassani

Abstract: Deep learning provides an excellent avenue for optimizing diagnosis and patient monitoring for clinical-based applications, which can critically enhance the response time to the onset of various conditions. For cardiovascular disease, one such condition where the rising number of patients increasingly outweighs the availability of medical resources in different parts of the world, a core challenge… ▽ More Deep learning provides an excellent avenue for optimizing diagnosis and patient monitoring for clinical-based applications, which can critically enhance the response time to the onset of various conditions. For cardiovascular disease, one such condition where the rising number of patients increasingly outweighs the availability of medical resources in different parts of the world, a core challenge is the automated classification of various cardiac abnormalities. Existing deep learning approaches have largely been limited to detecting the existence of an irregularity, as in binary classification, which has been achieved using networks such as CNNs and RNN/LSTMs. The next step is to accurately perform multi-class classification and determine the specific condition(s) from the inherently noisy multi-variate waveform, which is a difficult task that could benefit from (1) a more powerful sequential network, and (2) the integration of clinical notes, which provide valuable semantic and clinical context from human doctors. Recently, Transformers have emerged as the state-of-the-art architecture for forecasting and prediction using time-series data, with their multi-headed attention mechanism, and ability to process whole sequences and learn both long and short-range dependencies. The proposed novel multi-modal Transformer architecture would be able to accurately perform this task while demonstrating the cross-domain effectiveness of Transformers, establishing a method for incorporating multiple data modalities within a Transformer for classification tasks, and laying the groundwork for automating real-time patient condition monitoring in clinical and ER settings. △ Less

Submitted 21 February, 2023; originally announced February 2023.

Comments: 18 pages, 11 figures, submitted to Artificial Intelligence in Medicine journal

arXiv:2301.11412 [pdf, other]

doi 10.1093/mnras/stad312

Accretion of sub-stellar companions as the origin of chemical abundance inhomogeneities in globular clusters

Authors: Andrew J. Winter, Cathie J. Clarke

Abstract: Globular clusters exhibit abundance variations, defining `multiple populations', which have prompted a protracted search for their origin. Properties requiring explanation include: the high fraction of polluted stars ($\sim 40{-}90$~percent, correlated with cluster mass), the absence of pollution in young clusters and the lower pollution rate with binarity and distance from the cluster centre. We… ▽ More Globular clusters exhibit abundance variations, defining `multiple populations', which have prompted a protracted search for their origin. Properties requiring explanation include: the high fraction of polluted stars ($\sim 40{-}90$~percent, correlated with cluster mass), the absence of pollution in young clusters and the lower pollution rate with binarity and distance from the cluster centre. We present a novel mechanism for late delivery of pollutants into stars via accretion of sub-stellar companions. In this scenario, stars move through a medium polluted with AGB and massive star ejecta, accreting material to produce companions with typical mass ratio $q\sim 0.1$. These companions undergo eccentricity excitation due to dynamical perturbations by passing stars, culminating in a merger with their host star. The accretion of the companion alters surface abundances via injected pollutant. Alongside other self-enrichment models, the companion accretion model can explain the dilution of pollutant and correlation with intra-cluster location. The model also explains the ubiquity and discreteness of the populations and correlations of enrichment rates with cluster mass, cluster age and stellar binarity. Abundance variations in some clusters can be broadly reproduced using AGB and massive binary ejecta abundances from the literature. In other clusters, some high companion mass ratios ($q\gtrsim 1$) are required. In these cases, the available mass budget necessitates a variable degree of mixing of the polluted material with the primary star, deviations from model ejecta abundances or mixing of internal burning products. We highlight the avenues of further investigation which are required to explore some of the key processes invoked in this model. △ Less

Submitted 2 March, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

Comments: 29 pages, 20 figures, accepted for publication in MNRAS

arXiv:2301.08197 [pdf, other]

Stochastic entropy production associated with quantum measurement in a framework of Markovian quantum state diffusion

Authors: Claudia L. Clarke, Ian J. Ford

Abstract: The reduced density matrix that characterises the state of an open quantum system is a projection from the full density matrix of the quantum system and its environment, and there are many full density matrices consistent with a given reduced version. Without a specification of relevant details of the environment, the evolution of a reduced density matrix is therefore typically unpredictable, even… ▽ More The reduced density matrix that characterises the state of an open quantum system is a projection from the full density matrix of the quantum system and its environment, and there are many full density matrices consistent with a given reduced version. Without a specification of relevant details of the environment, the evolution of a reduced density matrix is therefore typically unpredictable, even if the dynamics are deterministic. With this in mind, we investigate a two level open quantum system using a framework of quantum state diffusion. We consider the pseudorandom evolution of its reduced density matrix when subjected to an environment-driven process of continuous quantum measurement of a system observable, using dynamics that asymptotically send the system to an eigenstate. The unpredictability is characterised by a stochastic entropy production, the average of which corresponds to an increase in the subjective uncertainty of the quantum state adopted by the system and environment, given the underspecified dynamics. This differs from a change in von Neumann entropy, and can continue indefinitely as the system is guided towards an eigenstate. As one would expect, the simultaneous measurement of two non-commuting observables within the same framework does not send the system to an eigenstate. Instead, the probability density function describing the reduced density matrix of the system becomes stationary over a continuum of pure states, a situation characterised by zero further stochastic entropy production. Transitions between such stationary states, brought about by changes in the relative strengths of the two measurement processes, give rise to finite positive mean stochastic entropy production. The framework investigated can offer useful perspectives on both the dynamics and irreversible thermodynamics of measurement in quantum systems. △ Less

Submitted 19 January, 2023; originally announced January 2023.

arXiv:2212.13337 [pdf]

Comprehensive evaluations of a prototype full field-of-view photon counting CT system through phantom studies

Authors: Xiaohui Zhan, Ruoqiao Zhang, Xiaofeng Niu, Ilmar Hein, Brent Budden, Shuoxing Wu, Nicolay Markov, Cameron Clarke, Yi Qiang, Hiroki Taguchi, Keiichi Nomura, Yoshihisa Muramatsu, Zhou Yu, Tatsushi Kobayashi, Richard Thompson, Hiroaki Miyazaki, Hiroaki Nakai

Abstract: Photon counting CT (PCCT) has been a research focus in the last two decades. Recent studies and advancements have demonstrated that systems using semiconductor-based photon counting detectors (PCDs) have the potential to provide better contrast, noise and spatial resolution performance compared to conventional scintillator-based systems. With multi-energy threshold detection, PCD can simultaneousl… ▽ More Photon counting CT (PCCT) has been a research focus in the last two decades. Recent studies and advancements have demonstrated that systems using semiconductor-based photon counting detectors (PCDs) have the potential to provide better contrast, noise and spatial resolution performance compared to conventional scintillator-based systems. With multi-energy threshold detection, PCD can simultaneously provide the photon energy measurement and enable material decomposition for spectral imaging. In this work, we report a performance evaluation of our first CdZnTe-based prototype full-size photon counting CT system through various phantom imaging studies. This prototype system supports a 500 mm scan field-of-view (FOV) and 10 mm cone coverage at isocenter. Phantom scans were acquired using 120 kVp from 50 to 400 mAs to assess the imaging performance on: CT number accuracy, uniformity, noise, spatial resolution, material differentiation and quantification. Both qualitative and quantitative evaluations show that PCCT has superior imaging performance with lower noise and improved spatial resolution compared to conventional energy integrating detector (EID)-CT. Using projection domain material decomposition approach with multiple energy bin measurements, PCCT virtual monoenergetic images (VMIs) have lower noise, and superior performance in quantifying iodine and calcium concentrations. These improvements lead to increased contrast-to-noise ratio (CNR) for both high and low contrast study objects compared to EID-CT. PCCT can also generate super-high resolution (SHR) images using much smaller detector pixel size than EID-CT and dramatically improve image spatial resolution. △ Less

Submitted 10 April, 2023; v1 submitted 26 December, 2022; originally announced December 2022.

arXiv:2212.07711 [pdf, other]

Dust dynamics in planet-forming discs in binary systems

Authors: Francesco Zagaria, Giovanni P. Rosotti, Richard D. Alexander, Cathie J. Clarke

Abstract: In multiple stellar systems interactions among the companion stars and their discs affect planet formation. In the circumstellar case tidal truncation makes protoplanetary discs smaller, fainter and less long-lived than those evolving in isolation, thereby reducing the amount of material (gas and dust) available to assemble planetary embryos. On the contrary, in the circumbinary case the reduced a… ▽ More In multiple stellar systems interactions among the companion stars and their discs affect planet formation. In the circumstellar case tidal truncation makes protoplanetary discs smaller, fainter and less long-lived than those evolving in isolation, thereby reducing the amount of material (gas and dust) available to assemble planetary embryos. On the contrary, in the circumbinary case the reduced accretion can increase the disc lifetime, with beneficial effects on planet formation. In this chapter we review the main observational results on discs in multiple stellar systems and discuss their possible explanations, focusing on recent numerical simulations, mainly dealing with dust dynamics and disc evolution. Finally, some open issues and future research directions are examined. △ Less

Submitted 15 December, 2022; originally announced December 2022.

Comments: Accepted for publication in EPJ Plus Focus Point on Environmental and Multiplicity Effects on Planet Formation G. Lodato and C.F. Manara (Guest editors)

arXiv:2210.04189 [pdf, other]

doi 10.1038/s41586-022-05007-2

Revealing the short-range structure of the "mirror nuclei" $^3$H and $^3$He

Authors: S. Li, R. Cruz-Torres, N. Santiesteban, Z. H. Ye, D. Abrams, S. Alsalmi, D. Androic, K. Aniol, J. Arrington, T. Averett, C. Ayerbe Gayoso, J. Bane, S. Barcus, J. Barrow, A. Beck, V. Bellini, H. Bhatt, D. Bhetuwal, D. Biswas, D. Bulumulla, A. Camsonne, J. Castellanos, J. Chen, J-P. Chen, D. Chrisman , et al. (91 additional authors not shown)

Abstract: When protons and neutrons (nucleons) are bound into atomic nuclei, they are close enough together to feel significant attraction, or repulsion, from the strong, short-distance part of the nucleon-nucleon interaction. These strong interactions lead to hard collisions between nucleons, generating pairs of highly-energetic nucleons referred to as short-range correlations (SRCs). SRCs are an important… ▽ More When protons and neutrons (nucleons) are bound into atomic nuclei, they are close enough together to feel significant attraction, or repulsion, from the strong, short-distance part of the nucleon-nucleon interaction. These strong interactions lead to hard collisions between nucleons, generating pairs of highly-energetic nucleons referred to as short-range correlations (SRCs). SRCs are an important but relatively poorly understood part of nuclear structure and mapping out the strength and isospin structure (neutron-proton vs proton-proton pairs) of these virtual excitations is thus critical input for modeling a range of nuclear, particle, and astrophysics measurements. Hitherto measurements used two-nucleon knockout or ``triple-coincidence'' reactions to measure the relative contribution of np- and pp-SRCs by knocking out a proton from the SRC and detecting its partner nucleon (proton or neutron). These measurementsshow that SRCs are almost exclusively np pairs, but had limited statistics and required large model-dependent final-state interaction (FSI) corrections. We report on the first measurement using inclusive scattering from the mirror nuclei $^3$H and $^3$He to extract the np/pp ratio of SRCs in the A=3 system. We obtain a measure of the np/pp SRC ratio that is an order of magnitude more precise than previous experiments, and find a dramatic deviation from the near-total np dominance observed in heavy nuclei. This result implies an unexpected structure in the high-momentum wavefunction for $^3$He and $^3$H. Understanding these results will improve our understanding of the short-range part of the N-N interaction. △ Less

Submitted 9 October, 2022; originally announced October 2022.

Journal ref: Nature 609, 41-45 (2022)

arXiv:2208.04887 [pdf, other]

Early Stage Sparse Retrieval with Entity Linking

Authors: Dahlia Shehata, Negar Arabzadeh, Charles L. A. Clarke

Abstract: Despite the advantages of their low-resource settings, traditional sparse retrievers depend on exact matching approaches between high-dimensional bag-of-words (BoW) representations of both the queries and the collection. As a result, retrieval performance is restricted by semantic discrepancies and vocabulary gaps. On the other hand, transformer-based dense retrievers introduce significant improve… ▽ More Despite the advantages of their low-resource settings, traditional sparse retrievers depend on exact matching approaches between high-dimensional bag-of-words (BoW) representations of both the queries and the collection. As a result, retrieval performance is restricted by semantic discrepancies and vocabulary gaps. On the other hand, transformer-based dense retrievers introduce significant improvements in information retrieval tasks by exploiting low-dimensional contextualized representations of the corpus. While dense retrievers are known for their relative effectiveness, they suffer from lower efficiency and lack of generalization issues, when compared to sparse retrievers. For a lightweight retrieval task, high computational resources and time consumption are major barriers encouraging the renunciation of dense models despite potential gains. In this work, we propose boosting the performance of sparse retrievers by expanding both the queries and the documents with linked entities in two formats for the entity names: 1) explicit and 2) hashed. We employ a zero-shot end-to-end dense entity linking system for entity recognition and disambiguation to augment the corpus. By leveraging the advanced entity linking methods, we believe that the effectiveness gap between sparse and dense retrievers can be narrowed. We conduct our experiments on the MS MARCO passage dataset. Since we are concerned with the early stage retrieval in cascaded ranking architectures of large information retrieval systems, we evaluate our results using recall@1000. Our approach is also capable of retrieving documents for query subsets judged to be particularly difficult in prior work. We further demonstrate that the non-expanded and the expanded runs with both explicit and hashed entities retrieve complementary results. Consequently, we adopt a run fusion approach to maximize the benefits of entity linking. △ Less

Submitted 10 August, 2022; v1 submitted 9 August, 2022; originally announced August 2022.

arXiv:2208.04882 [pdf, other]

Unsupervised Question Clarity Prediction Through Retrieved Item Coherency

Authors: Negar Arabzadeh, Mahsa Seifikar, Charles L. A. Clarke

Abstract: Despite recent progress on conversational systems, they still do not perform smoothly and coherently when faced with ambiguous requests. When questions are unclear, conversational systems should have the ability to ask clarifying questions, rather than assuming a particular interpretation or simply responding that they do not understand. Previous studies have shown that users are more satisfied wh… ▽ More Despite recent progress on conversational systems, they still do not perform smoothly and coherently when faced with ambiguous requests. When questions are unclear, conversational systems should have the ability to ask clarifying questions, rather than assuming a particular interpretation or simply responding that they do not understand. Previous studies have shown that users are more satisfied when asked a clarifying question, rather than receiving an unrelated response. While the research community has paid substantial attention to the problem of predicting query ambiguity in traditional search contexts, researchers have paid relatively little attention to predicting when this ambiguity is sufficient to warrant clarification in the context of conversational systems. In this paper, we propose an unsupervised method for predicting the need for clarification. This method is based on the measured coherency of results from an initial answer retrieval step, under the assumption that a less ambiguous query is more likely to retrieve more coherent results when compared to an ambiguous query. We build a graph from retrieved items based on their context similarity, treating measures of graph connectivity as indicators of ambiguity. We evaluate our approach on two recently released open-domain conversational question answering datasets, ClariQ and AmbigNQ, comparing it with neural and non-neural baselines. Our unsupervised approach performs as well as supervised approaches while providing better generalization. △ Less

Submitted 9 August, 2022; originally announced August 2022.

arXiv:2206.13558 [pdf, other]

doi 10.1093/mnras/stac1863

Forming short period sub-stellar companions in 47 Tucanae -- II. Analytic expressions for the orbital evolution of planets in dense environments

Authors: Andrew J. Winter, Cathie J. Clarke, Giovanni Rosotti, Mirek Giersz

Abstract: Short period, massive planets, known as hot Jupiters (HJs), have been discovered around $\sim 1$ percent of local field stars. The inward migration necessary to produce HJs may be `low eccentricity', due to torques in the primordial disc, or `high eccentricity' (HEM). The latter involves exciting high orbital eccentricity, allowing sufficiently close passages with the host star to raise circularis… ▽ More Short period, massive planets, known as hot Jupiters (HJs), have been discovered around $\sim 1$ percent of local field stars. The inward migration necessary to produce HJs may be `low eccentricity', due to torques in the primordial disc, or `high eccentricity' (HEM). The latter involves exciting high orbital eccentricity, allowing sufficiently close passages with the host star to raise circularising tides in the planet. We present an analytic framework for quantifying the role of dynamical encounters in high density environments during HEM. We show that encounters can enhance or suppress HEM, depending on the local stellar density and the initial semi-major axis $a_0$. For moderate densities, external perturbations can excite large eccentricities that allow a planet to circularise over the stellar lifetime. At extremely high densities, these perturbations can instead result in tidal disruption of the planet, thus yielding no HJ. This may explain the apparent excess of HJs in M67 compared with their local field star abundance versus their apparent deficit in 47 Tuc. Applying our analytic framework, we demonstrate that for an initial massive planet population similar to the field, the expected HJ occurrence rate in 47 Tuc is $f_\mathrm{HJ}=2.2\times 10^{-3}$, which remains consistent with present constraints. Future large (sample sizes $\gtrsim 10^5$) or sensitive transit surveys of stars in globular clusters are required to refute the hypothesis that the initial planet population is similar to the solar neighbourhood average. Non-detection in such surveys would have broad consequences for planet formation theory, implying planet formation rates in globular clusters must be suppressed across a wide range of $a_0$. △ Less

Submitted 27 June, 2022; originally announced June 2022.

Comments: Accepted for publication in MNRAS - 28 pages, 17 figures

arXiv:2206.11308 [pdf, other]

doi 10.1093/mnras/stac1770

Super-resolution trends in the ALMA Taurus survey: Structured inner discs and compact discs

Authors: Jeff Jennings, Marco Tazzari, Cathie J. Clarke, Richard A. Booth, Giovanni P. Rosotti

Abstract: The 1.33 mm survey of protoplanetary discs in the Taurus molecular cloud found annular gaps and rings to be common in extended sources (>~55 au), when their 1D visibility distributions were fit parametrically. We first demonstrate the advantages and limitations of nonparametric visibility fits for data at the survey's 0.12" resolution. Then we use the nonparametric model in Frankenstein ('frank')… ▽ More The 1.33 mm survey of protoplanetary discs in the Taurus molecular cloud found annular gaps and rings to be common in extended sources (>~55 au), when their 1D visibility distributions were fit parametrically. We first demonstrate the advantages and limitations of nonparametric visibility fits for data at the survey's 0.12" resolution. Then we use the nonparametric model in Frankenstein ('frank') to identify new substructure in three compact and seven extended sources. Among the new features we identify three trends: a higher occurrence rate of substructure in the survey's compact discs than previously seen, underresolved (potentially azimuthally asymmetric) substructure in the innermost disc of extended sources, and a 'shoulder' on the trailing edge of a ring in discs with strong depletion at small radii. Noting the shoulder morphology is present in multiple discs observed at higher resolution, we postulate it is tracing a common physical mechanism. We further demonstrate how a super-resolution frank brightness profile is useful in motivating an accurate parametric model, using the highly structured source DL Tau in which frank finds two new rings. Finally we show that sparse (u, v) plane sampling may be masking the presence of substructure in several additional compact survey sources. △ Less

Submitted 22 June, 2022; originally announced June 2022.

Comments: Accepted to MNRAS

Showing 1–50 of 380 results for author: Clarke, C