Showing 1–2 of 2 results for author: Nix, S

Search v0.5.6 released 2020-02-24

arXiv:2205.01663 [pdf, other]

cs.LG cs.AI cs.CL

Adversarial Training for High-Stakes Reliability

Authors: Daniel M. Ziegler, Seraphina Nix, Lawrence Chan, Tim Bauman, Peter Schmidt-Nielsen, Tao Lin, Adam Scherlis, Noa Nabeshima, Ben Weinstein-Raun, Daniel de Haas, Buck Shlegeris, Nate Thomas

Abstract: In the future, powerful AI systems may be deployed in high-stakes settings, where a single failure could be catastrophic. One technique for improving AI safety in high-stakes settings is adversarial training, which uses an adversary to generate examples to train on in order to achieve better worst-case performance. In this work, we used a safe language generation task (``avoid injuries'') as a t… ▽ More In the future, powerful AI systems may be deployed in high-stakes settings, where a single failure could be catastrophic. One technique for improving AI safety in high-stakes settings is adversarial training, which uses an adversary to generate examples to train on in order to achieve better worst-case performance. In this work, we used a safe language generation task (``avoid injuries'') as a testbed for achieving high reliability through adversarial training. We created a series of adversarial training techniques -- including a tool that assists human adversaries -- to find and eliminate failures in a classifier that filters text completions suggested by a generator. In our task, we determined that we can set very conservative classifier thresholds without significantly impacting the quality of the filtered outputs. We found that adversarial training increased robustness to the adversarial attacks that we trained on -- doubling the time for our contractors to find adversarial examples both with our tool (from 13 to 26 minutes) and without (from 20 to 44 minutes) -- without affecting in-distribution performance. We hope to see further work in the high-stakes reliability setting, including more powerful tools for enhancing human adversaries and better ways to measure high levels of reliability, until we can confidently rule out the possibility of catastrophic deployment-time failures of powerful models. △ Less

Submitted 9 November, 2022; v1 submitted 3 May, 2022; originally announced May 2022.

Comments: 30 pages, 7 figures, NeurIPS camera-ready
arXiv:1807.09391 [pdf, other]

physics.ins-det physics.atom-ph quant-ph

doi 10.1016/j.dark.2018.10.002

Characterization of the Global Network of Optical Magnetometers to search for Exotic Physics (GNOME)

Authors: S. Afach, D. Budker, G. DeCamp, V. Dumont, Z. D. Grujić, H. Guo, D. F. Jackson Kimball, T. W. Kornack, V. Lebedev, W. Li, H. Masia-Roig, S. Nix, M. Padniuk, C. A. Palm, C. Pankow, A. Penaflor, X. Peng, S. Pustelny, T. Scholtes, J. A. Smiga, J. E. Stalnaker, A. Weis, A. Wickenbrock, D. Wurm

Abstract: The Global Network of Optical Magnetometers to search for Exotic physics (GNOME) is a network of geographically separated, time-synchronized, optically pumped atomic magnetometers that is being used to search for correlated transient signals heralding exotic physics. The GNOME is sensitive to nuclear- and electron-spin couplings to exotic fields from astrophysical sources such as compact dark-matt… ▽ More The Global Network of Optical Magnetometers to search for Exotic physics (GNOME) is a network of geographically separated, time-synchronized, optically pumped atomic magnetometers that is being used to search for correlated transient signals heralding exotic physics. The GNOME is sensitive to nuclear- and electron-spin couplings to exotic fields from astrophysical sources such as compact dark-matter objects (for example, axion stars and domain walls). Properties of the GNOME sensors such as sensitivity, bandwidth, and noise characteristics are studied in the present work, and features of the network's operation (e.g., data acquisition, format, storage, and diagnostics) are described. Characterization of the GNOME is a key prerequisite to searches for and identification of exotic physics signatures. △ Less

Submitted 24 July, 2018; originally announced July 2018.

Comments: 45 pages, 16 figures, 2 tables

Journal ref: Physics of the Dark Universe 22, 162 (2018)

Search v0.5.6 released 2020-02-24