Evaluating Large Language Models for Public Health Classification and Extraction Tasks
Authors:
Joshua Harris,
Timothy Laurence,
Leo Loman,
Fan Grayson,
Toby Nonnenmacher,
Harry Long,
Loes WalsGriffith,
Amy Douglas,
Holly Fountain,
Stelios Georgiou,
Jo Hardstaff,
Kathryn Hopkins,
Y-Ling Chi,
Galena Kuyumdzhieva,
Lesley Larkin,
Samuel Collins,
Hamish Mohammed,
Thomas Finnie,
Luke Hounsome,
Steven Riley
Abstract:
Advances in Large Language Models (LLMs) have led to significant interest in their potential to support human experts across a range of domains, including public health. In this work we present automated evaluations of LLMs for public health tasks involving the classification and extraction of free text. We combine six externally annotated datasets with seven new internally annotated datasets to e…
▽ More
Advances in Large Language Models (LLMs) have led to significant interest in their potential to support human experts across a range of domains, including public health. In this work we present automated evaluations of LLMs for public health tasks involving the classification and extraction of free text. We combine six externally annotated datasets with seven new internally annotated datasets to evaluate LLMs for processing text related to: health burden, epidemiological risk factors, and public health interventions. We initially evaluate five open-weight LLMs (7-70 billion parameters) across all tasks using zero-shot in-context learning. We find that Llama-3-70B-Instruct is the highest performing model, achieving the best results on 15/17 tasks (using micro-F1 scores). We see significant variation across tasks with all open-weight LLMs scoring below 60% micro-F1 on some challenging tasks, such as Contact Classification, while all LLMs achieve greater than 80% micro-F1 on others, such as GI Illness Classification. For a subset of 12 tasks, we also evaluate GPT-4 and find comparable results to Llama-3-70B-Instruct, which scores equally or outperforms GPT-4 on 6 of the 12 tasks. Overall, based on these initial results we find promising signs that LLMs may be useful tools for public health experts to extract information from a wide variety of free text sources, and support public health surveillance, research, and interventions.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
Measurements of Degree-Scale B-mode Polarization with the BICEP/Keck Experiments at South Pole
Authors:
The BICEP/Keck Collaboration,
:,
P. A. R. Ade,
Z. Ahmed,
R. W. Aikin,
K. D. Alexander,
D. Barkats,
S. J. Benton,
C. A. Bischoff,
J. J. Bock,
H. Boenish,
R. Bowens-Rubin,
J. A. Brevik,
I. Buder,
E. Bullock,
V. Buza,
J. Connors,
J. Cornelison,
B. P. Crill,
M. Crumrine,
M. Dierickx,
L. Duband,
C. Dvorkin,
J. P. Filippini,
S. Fliescherj J. Grayson
, et al. (55 additional authors not shown)
Abstract:
The BICEP and Keck Array experiments are a suite of small-aperture refracting telescopes observing the microwave sky from the South Pole. They target the degree-scale B-mode polarization signal imprinted in the Cosmic Microwave Background (CMB) by primordial gravitational waves. Such a measurement would shed light on the physics of the very early universe. While BICEP2 observed for the first time…
▽ More
The BICEP and Keck Array experiments are a suite of small-aperture refracting telescopes observing the microwave sky from the South Pole. They target the degree-scale B-mode polarization signal imprinted in the Cosmic Microwave Background (CMB) by primordial gravitational waves. Such a measurement would shed light on the physics of the very early universe. While BICEP2 observed for the first time a B-mode signal at 150 GHz, higher frequencies from the Planck satellite showed that it could be entirely due to the polarized emission from Galactic dust, though uncertainty remained high. Keck Array has been observing the same region of the sky for several years, with an increased detector count, producing the deepest polarized CMB maps to date. New detectors at 95 GHz were installed in 2014, and at 220 GHz in 2015. These observations enable a better constraint of galactic foreground emissions, as presented here. In 2015, BICEP2 was replaced by BICEP3, a 10 times higher throughput telescope observing at 95 GHz, while Keck Array is now focusing on higher frequencies. In the near future, BICEP Array will replace Keck Array, and will allow unprecedented sensitivity to the gravitational wave signal. High resolution observations from the South Pole Telescope (SPT) will also be used to remove the lensing contribution to B-modes.
△ Less
Submitted 27 October, 2018; v1 submitted 5 July, 2018;
originally announced July 2018.