-
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Authors:
Zeming Chen,
Alejandro Hernández Cano,
Angelika Romanou,
Antoine Bonnet,
Kyle Matoba,
Francesco Salvi,
Matteo Pagliardini,
Simin Fan,
Andreas Köpf,
Amirkeivan Mohtashami,
Alexandre Sallinen,
Alireza Sakhaeirad,
Vinitra Swamy,
Igor Krawczuk,
Deniz Bayazit,
Axel Marmet,
Syrielle Montariol,
Mary-Anne Hartley,
Martin Jaggi,
Antoine Bosselut
Abstract:
Large language models (LLMs) can potentially democratize access to medical knowledge. While many efforts have been made to harness and improve LLMs' medical knowledge and reasoning capacities, the resulting models are either closed-source (e.g., PaLM, GPT-4) or limited in scale (<= 13B parameters), which restricts their abilities. In this work, we improve access to large-scale medical LLMs by rele…
▽ More
Large language models (LLMs) can potentially democratize access to medical knowledge. While many efforts have been made to harness and improve LLMs' medical knowledge and reasoning capacities, the resulting models are either closed-source (e.g., PaLM, GPT-4) or limited in scale (<= 13B parameters), which restricts their abilities. In this work, we improve access to large-scale medical LLMs by releasing MEDITRON: a suite of open-source LLMs with 7B and 70B parameters adapted to the medical domain. MEDITRON builds on Llama-2 (through our adaptation of Nvidia's Megatron-LM distributed trainer), and extends pretraining on a comprehensively curated medical corpus, including selected PubMed articles, abstracts, and internationally-recognized medical guidelines. Evaluations using four major medical benchmarks show significant performance gains over several state-of-the-art baselines before and after task-specific finetuning. Overall, MEDITRON achieves a 6% absolute performance gain over the best public baseline in its parameter class and 3% over the strongest baseline we finetuned from Llama-2. Compared to closed-source LLMs, MEDITRON-70B outperforms GPT-3.5 and Med-PaLM and is within 5% of GPT-4 and 10% of Med-PaLM-2. We release our code for curating the medical pretraining corpus and the MEDITRON model weights to drive open-source development of more capable medical LLMs.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
OpenAssistant Conversations -- Democratizing Large Language Model Alignment
Authors:
Andreas Köpf,
Yannic Kilcher,
Dimitri von Rütte,
Sotiris Anagnostidis,
Zhi-Rui Tam,
Keith Stevens,
Abdullah Barhoum,
Nguyen Minh Duc,
Oliver Stanley,
Richárd Nagyfi,
Shahul ES,
Sameer Suri,
David Glushkov,
Arnav Dantuluri,
Andrew Maguire,
Christoph Schuhmann,
Huu Nguyen,
Alexander Mattick
Abstract:
Aligning large language models (LLMs) with human preferences has proven to drastically improve usability and has driven rapid adoption as demonstrated by ChatGPT. Alignment techniques such as supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) greatly reduce the required skill and domain knowledge to effectively harness the capabilities of LLMs, increasing their acce…
▽ More
Aligning large language models (LLMs) with human preferences has proven to drastically improve usability and has driven rapid adoption as demonstrated by ChatGPT. Alignment techniques such as supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) greatly reduce the required skill and domain knowledge to effectively harness the capabilities of LLMs, increasing their accessibility and utility across various domains. However, state-of-the-art alignment techniques like RLHF rely on high-quality human feedback data, which is expensive to create and often remains proprietary. In an effort to democratize research on large-scale alignment, we release OpenAssistant Conversations, a human-generated, human-annotated assistant-style conversation corpus consisting of 161,443 messages in 35 different languages, annotated with 461,292 quality ratings, resulting in over 10,000 complete and fully annotated conversation trees. The corpus is a product of a worldwide crowd-sourcing effort involving over 13,500 volunteers. Models trained on OpenAssistant Conversations show consistent improvements on standard benchmarks over respective base models. We release our code and data under a fully permissive licence.
△ Less
Submitted 31 October, 2023; v1 submitted 14 April, 2023;
originally announced April 2023.
-
The Dayside Ionopause of Mars: Solar Wind Interaction, Pressure Balance, and Comparisons with Venus
Authors:
F. Chu,
Z. Girazian,
F. Duru,
R. Ramstad,
J. Halekas,
D. A. Gurnett,
Xin Cao,
A. J. Kopf
Abstract:
Due to the lower ionospheric thermal pressure and existence of the crustal magnetism at Mars, the Martian ionopause is expected to behave differently from the ionopause at Venus. We study the solar wind interaction and pressure balance at the ionopause of Mars using both in situ and remote sounding measurements from the MARSIS (Mars Advanced Radar for Subsurface and Ionosphere Sounding) instrument…
▽ More
Due to the lower ionospheric thermal pressure and existence of the crustal magnetism at Mars, the Martian ionopause is expected to behave differently from the ionopause at Venus. We study the solar wind interaction and pressure balance at the ionopause of Mars using both in situ and remote sounding measurements from the MARSIS (Mars Advanced Radar for Subsurface and Ionosphere Sounding) instrument on the Mars Express orbiter. We show that the magnetic pressure usually dominates the thermal pressure to hold off the solar wind at the ionopause at Mars, with only 13% of the cases where the ionospheric thermal pressure plays a more important role in pressure balance. This percentage at Venus, however, is up to 65%. We also find that the ionopause altitude at Mars decreases as the normal component of the solar wind dynamic pressure increases, similar to the altitude variation of the ionopauses at Venus. Moreover, our results show that the ionopause thickness at Mars and Venus is mainly determined by the ion gyromotion and is equivalent to about 5 ion gyroradii.
△ Less
Submitted 1 November, 2021; v1 submitted 23 March, 2021;
originally announced March 2021.
-
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Authors:
Adam Paszke,
Sam Gross,
Francisco Massa,
Adam Lerer,
James Bradbury,
Gregory Chanan,
Trevor Killeen,
Zeming Lin,
Natalia Gimelshein,
Luca Antiga,
Alban Desmaison,
Andreas Köpf,
Edward Yang,
Zach DeVito,
Martin Raison,
Alykhan Tejani,
Sasank Chilamkurthy,
Benoit Steiner,
Lu Fang,
Junjie Bai,
Soumith Chintala
Abstract:
Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it provides an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting…
▽ More
Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it provides an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs.
In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect of PyTorch is a regular Python program under the full control of its user. We also explain how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance.
We demonstrate the efficiency of individual subsystems, as well as the overall speed of PyTorch on several common benchmarks.
△ Less
Submitted 3 December, 2019;
originally announced December 2019.
-
Mixture-of-Experts Variational Autoencoder for Clustering and Generating from Similarity-Based Representations on Single Cell Data
Authors:
Andreas Kopf,
Vincent Fortuin,
Vignesh Ram Somnath,
Manfred Claassen
Abstract:
Clustering high-dimensional data, such as images or biological measurements, is a long-standingproblem and has been studied extensively. Recently, Deep Clustering has gained popularity due toits flexibility in fitting the specific peculiarities of complex data. Here we introduce the Mixture-of-Experts Similarity Variational Autoencoder (MoE-Sim-VAE), a novel generative clustering model.The model c…
▽ More
Clustering high-dimensional data, such as images or biological measurements, is a long-standingproblem and has been studied extensively. Recently, Deep Clustering has gained popularity due toits flexibility in fitting the specific peculiarities of complex data. Here we introduce the Mixture-of-Experts Similarity Variational Autoencoder (MoE-Sim-VAE), a novel generative clustering model.The model can learn multi-modal distributions of high-dimensional data and use these to generaterealistic data with high efficacy and efficiency. MoE-Sim-VAE is based on a Variational Autoencoder(VAE), where the decoder consists of a Mixture-of-Experts (MoE) architecture. This specific architecture allows for various modes of the data to be automatically learned by means of the experts.Additionally, we encourage the lower dimensional latent representation of our model to follow aGaussian mixture distribution and to accurately represent the similarities between the data points. Weassess the performance of our model on the MNIST benchmark data set and challenging real-worldtasks of clustering mouse organs from single-cell RNA-sequencing measurements and defining cellsubpopulations from mass cytometry (CyTOF) measurements on hundreds of different datasets.MoE-Sim-VAE exhibits superior clustering performance on all these tasks in comparison to thebaselines as well as competitor methods.
△ Less
Submitted 18 December, 2020; v1 submitted 17 October, 2019;
originally announced October 2019.
-
Mars' plasma system. Scientific potential of coordinated multi-point missions: "The next generation" (A White Paper submitted to ESA's Voyage 2050 Call)
Authors:
Beatriz Sánchez-Cano,
Mark Lester,
David J. Andrews,
Hermann Opgenoorth,
Robert Lillis,
François Leblanc,
Christopher M. Fowler,
Xiaohua Fang,
Oleg Vaisberg,
Majd Mayyasi,
Mika Holmberg,
Jingnan Guo,
Maria Hamrin,
Christian Mazelle,
Kerstin Peter,
Martin Pätzold,
Katerina Stergiopoulou,
Charlotte Goetz,
Vladimir Nikolaevich Ermakov,
Sergei Shuvalov,
James Wild,
Pierre-Louis Blelly,
Michael Mendillo,
Cesar Bertucci,
Marco Cartacci
, et al. (5 additional authors not shown)
Abstract:
The objective of this White Paper submitted to ESA's Voyage 2050 call is to get a more holistic knowledge of the dynamics of the Martian plasma system from its surface up to the undisturbed solar wind outside of the induced magnetosphere. This can only be achieved with coordinated multi-point observations with high temporal resolution as they have the scientific potential to track the whole dynami…
▽ More
The objective of this White Paper submitted to ESA's Voyage 2050 call is to get a more holistic knowledge of the dynamics of the Martian plasma system from its surface up to the undisturbed solar wind outside of the induced magnetosphere. This can only be achieved with coordinated multi-point observations with high temporal resolution as they have the scientific potential to track the whole dynamics of the system (from small to large scales), and they constitute the next generation of Mars' exploration as it happened at Earth few decades ago. This White Paper discusses the key science questions that are still open at Mars and how they could be addressed with coordinated multipoint missions. The main science questions are: (i) How does solar wind driving impact on magnetospheric and ionospheric dynamics? (ii) What is the structure and nature of the tail of Mars' magnetosphere at all scales? (iii) How does the lower atmosphere couple to the upper atmosphere? (iv) Why should we have a permanent in-situ Space Weather monitor at Mars? Each science question is devoted to a specific plasma region, and includes several specific scientific objectives to study in the coming decades. In addition, two mission concepts are also proposed based on coordinated multi-point science from a constellation of orbiting and ground-based platforms, which focus on understanding and solving the current science gaps.
△ Less
Submitted 15 August, 2019;
originally announced August 2019.
-
Variations in the Ionospheric Peak Altitude at Mars in Response to Dust Storms: 13 Years of Observations from the Mars Express Radar Sounder
Authors:
Z. Girazian,
Z. Luppen,
D. D. Morgan,
F. Chu,
L. Montabone,
E. M. B. Thiemann,
D. A. Gurnett,
J. Halekas,
A. J. Kopf,
F. Nemec
Abstract:
Previous observations have shown that, during Martian dust storms, the peak of the ionosphere rises in altitude. Observational studies of this type, however, have been extremely limited. Using 13 years of ionospheric peak altitude data from the Mars Advanced Radar for Subsurface and Ionosphere Sounding (MARSIS) instrument on Mars Express, we study how the peak altitude responded to dust storms dur…
▽ More
Previous observations have shown that, during Martian dust storms, the peak of the ionosphere rises in altitude. Observational studies of this type, however, have been extremely limited. Using 13 years of ionospheric peak altitude data from the Mars Advanced Radar for Subsurface and Ionosphere Sounding (MARSIS) instrument on Mars Express, we study how the peak altitude responded to dust storms during six different Mars Years (MY). The peak altitude increased $\sim$10-15 km during all six events, which include a local dust storm (MY 33), three regional regional dust storms (MYs 27, 29, and 32), and two global dust storms (MYs 28 and 34). The peak altitude's orbit-to-orbit variability was exceptionally large at the apexes of the MY 29 and MY 32 dust seasons, and dramatically increased during the MY 28 and MY 34 global dust storms. We conclude that dust storms significantly increase upper atmospheric variability, which suggests that they enhance dynamical processes that couple the lower and upper atmospheres, such as upward propagating gravity waves or atmospheric tides.
△ Less
Submitted 19 August, 2019; v1 submitted 17 June, 2019;
originally announced June 2019.
-
The Effects of Solar Wind Dynamic Pressure on the Structure of the Topside Ionosphere of Mars
Authors:
Z. Girazian,
J. Halekas,
D. D. Morgan,
A. J. Kopf,
D. A. Gurnett,
F. Chu
Abstract:
We use Mars Atmosphere and Volatile EvolutioN observations of the upstream solar wind, and Mars Express observations of ionospheric electron densities and magnetic fields, to study how the topside ionosphere ($>$ 320 km) of Mars is affected by variations in solar wind dynamic pressure. We find that high solar wind dynamic pressures result in the topside ionosphere being depleted of plasma at all s…
▽ More
We use Mars Atmosphere and Volatile EvolutioN observations of the upstream solar wind, and Mars Express observations of ionospheric electron densities and magnetic fields, to study how the topside ionosphere ($>$ 320 km) of Mars is affected by variations in solar wind dynamic pressure. We find that high solar wind dynamic pressures result in the topside ionosphere being depleted of plasma at all solar zenith angles, coincident with increased induced magnetic field strengths. The depletion of topside plasma in response to high solar wind dynamic pressures is observed in both weak and strong crustal magnetic field regions. Taken together, our results suggest that high solar wind dynamic pressures lead to ionospheric compression, increased ion escape, and reduced day-to-night plasma transport in the high-altitude nightside ionosphere.
△ Less
Submitted 31 July, 2019; v1 submitted 28 May, 2019;
originally announced May 2019.
-
The effects of crustal magnetic fields and solar EUV flux on ionopause formation at Mars
Authors:
F. Chu,
Z. Girazian,
D. A. Gurnett,
D. D. Morgan,
J. Halekas,
A. J. Kopf,
E. M. B. Thiemann,
F. Duru
Abstract:
We study the ionopause of Mars using a database of 6,893 ionopause detections obtained over 11 years by the MARSIS (Mars Advanced Radar for Subsurface and Ionosphere Sounding) experiment. The ionopause, in this work, is defined as a steep density gradient that appears in MARSIS remote sounding ionograms as a horizontal line at frequencies below 0.4 MHz. We find that the ionopause is located on ave…
▽ More
We study the ionopause of Mars using a database of 6,893 ionopause detections obtained over 11 years by the MARSIS (Mars Advanced Radar for Subsurface and Ionosphere Sounding) experiment. The ionopause, in this work, is defined as a steep density gradient that appears in MARSIS remote sounding ionograms as a horizontal line at frequencies below 0.4 MHz. We find that the ionopause is located on average at an altitude of $363 \pm 65$ km. We also find that the ionopause altitude has a weak dependence on solar zenith angle and varies with the solar extreme ultraviolet (EUV) flux on annual and solar cycle time scales. Furthermore, our results show that very few ionopauses are observed when the crustal field strength at 400 km is greater than 40 nT. The strong crustal fields act as mini-magnetospheres that alter the solar wind interaction and prevent the ionopause from forming.
△ Less
Submitted 29 August, 2019; v1 submitted 25 March, 2019;
originally announced March 2019.
-
MARSIS observations of field-aligned irregularities and ducted radio propagation in the Martian ionosphere
Authors:
David J. Andrews,
Hermann J. Opgenoorth,
Thomas B. Leyser,
Stephan Buchert,
Niklas J. T. Edberg,
David D. Morgan,
Donald A. Gurnett,
Andrew J. Kopf,
Katy Fallows,
Paul Withers
Abstract:
Knowledge of Mars's ionosphere has been significantly advanced in recent years by observations from Mars Express (MEX) and lately MAVEN. A topic of particular interest are the interactions between the planet's ionospheric plasma and its highly structured crustal magnetic fields, and how these lead to the redistribution of plasma and affect the propagation of radio waves in the system. In this pape…
▽ More
Knowledge of Mars's ionosphere has been significantly advanced in recent years by observations from Mars Express (MEX) and lately MAVEN. A topic of particular interest are the interactions between the planet's ionospheric plasma and its highly structured crustal magnetic fields, and how these lead to the redistribution of plasma and affect the propagation of radio waves in the system. In this paper, we elucidate a possible relationship between two anomalous radar signatures previously reported in observations from the MARSIS instrument on MEX. Relatively uncommon observations of localized, extreme increases in the ionospheric peak density in regions of radial (cusp-like) magnetic fields and spread-echo radar signatures are shown to be coincident with ducting of the same radar pulses at higher altitudes on the same field lines. We suggest that these two observations are both caused by a high electric field (perpendicular to $\mathbf{B}$) having distinctly different effects in two altitude regimes. At lower altitudes, where ions are demagnetized and electrons magnetized, and recombination dominantes, a high electric field causes irregularities, plasma turbulence, electron heating, slower recombination and ultimately enhanced plasma densities. However, at higher altitudes, where both ions and electrons are magnetized and atomic oxygen ions cannot recombine directly, the high electric field instead causes frictional heating, a faster production of molecular ions by charge exchange, and so a density decrease. The latter enables ducting of radar pulses on closed field lines, in an analogous fashion to inter-hemispheric ducting in the Earth's ionosphere.
△ Less
Submitted 15 August, 2018;
originally announced August 2018.
-
Photon Dispersion in a Supernova Core
Authors:
A. Kopf,
G. Raffelt
Abstract:
While the photon forward-scattering amplitude on free magnetic dipoles (e.g. free neutrons) vanishes, the nucleon magnetic moments still contribute significantly to the photon dispersion relation in a supernova (SN) core where the nucleon spins are not free due to their interaction. We study the frequency dependence of the relevant spin susceptibility in a toy model with only neutrons which inte…
▽ More
While the photon forward-scattering amplitude on free magnetic dipoles (e.g. free neutrons) vanishes, the nucleon magnetic moments still contribute significantly to the photon dispersion relation in a supernova (SN) core where the nucleon spins are not free due to their interaction. We study the frequency dependence of the relevant spin susceptibility in a toy model with only neutrons which interact by one-pion exchange. Our approach amounts to calculating the photon absorption rate from the inverse bremsstrahlung process gamma n n --> n n, and then deriving the refractive index n_refr with the help of the Kramers-Kronig relation. In the static limit (omega --> 0) the dispersion relation is governed by the Pauli susceptibility chi_Pauli so that (n_refr)^2-1 approx chi_Pauli > 0. For omega somewhat above the neutron spin-relaxation rate Gamma_sigma we find (n_refr)^2-1< 0, and for omega >> Gamma_sigma the photon dispersion relation acquires the form omega^2-k^2=(m_gamma)^2. An exact expression for the "transverse photon mass" m_gamma is given in terms of the f-sum of the neutron spin autocorrelation function; an estimate is (m_gamma)^2 approx chi_Pauli T Gamma_sigma. The dominant contribution to n_refr$ in a SN core remains the electron plasma frequency so that the Cherenkov processes gamma nu <--> nu remain forbidden for all photon frequencies.
△ Less
Submitted 18 November, 1997;
originally announced November 1997.