-
19 Parameters Is All You Need: Tiny Neural Networks for Particle Physics
Authors:
Alexander Bogatskiy,
Timothy Hoffman,
Jan T. Offermann
Abstract:
As particle accelerators increase their collision rates, and deep learning solutions prove their viability, there is a growing need for lightweight and fast neural network architectures for low-latency tasks such as triggering. We examine the potential of one recent Lorentz- and permutation-symmetric architecture, PELICAN, and present its instances with as few as 19 trainable parameters that outpe…
▽ More
As particle accelerators increase their collision rates, and deep learning solutions prove their viability, there is a growing need for lightweight and fast neural network architectures for low-latency tasks such as triggering. We examine the potential of one recent Lorentz- and permutation-symmetric architecture, PELICAN, and present its instances with as few as 19 trainable parameters that outperform generic architectures with tens of thousands of parameters when compared on the binary classification task of top quark jet tagging.
△ Less
Submitted 13 December, 2023; v1 submitted 24 October, 2023;
originally announced October 2023.
-
Explainable Equivariant Neural Networks for Particle Physics: PELICAN
Authors:
Alexander Bogatskiy,
Timothy Hoffman,
David W. Miller,
Jan T. Offermann,
Xiaoyang Liu
Abstract:
PELICAN is a novel permutation equivariant and Lorentz invariant or covariant aggregator network designed to overcome common limitations found in architectures applied to particle physics problems. Compared to many approaches that use non-specialized architectures that neglect underlying physics principles and require very large numbers of parameters, PELICAN employs a fundamentally symmetry group…
▽ More
PELICAN is a novel permutation equivariant and Lorentz invariant or covariant aggregator network designed to overcome common limitations found in architectures applied to particle physics problems. Compared to many approaches that use non-specialized architectures that neglect underlying physics principles and require very large numbers of parameters, PELICAN employs a fundamentally symmetry group-based architecture that demonstrates benefits in terms of reduced complexity, increased interpretability, and raw performance. We present a comprehensive study of the PELICAN algorithm architecture in the context of both tagging (classification) and reconstructing (regression) Lorentz-boosted top quarks, including the difficult task of specifically identifying and measuring the $W$-boson inside the dense environment of the Lorentz-boosted top-quark hadronic final state. We also extend the application of PELICAN to the tasks of identifying quark-initiated vs.~gluon-initiated jets, and a multi-class identification across five separate target categories of jets. When tested on the standard task of Lorentz-boosted top-quark tagging, PELICAN outperforms existing competitors with much lower model complexity and high sample efficiency. On the less common and more complex task of 4-momentum regression, PELICAN also outperforms hand-crafted, non-machine learning algorithms. We discuss the implications of symmetry-restricted architectures for the wider field of machine learning for physics.
△ Less
Submitted 23 February, 2024; v1 submitted 31 July, 2023;
originally announced July 2023.
-
PELICAN: Permutation Equivariant and Lorentz Invariant or Covariant Aggregator Network for Particle Physics
Authors:
Alexander Bogatskiy,
Timothy Hoffman,
David W. Miller,
Jan T. Offermann
Abstract:
Many current approaches to machine learning in particle physics use generic architectures that require large numbers of parameters and disregard underlying physics principles, limiting their applicability as scientific modeling tools. In this work, we present a machine learning architecture that uses a set of inputs maximally reduced with respect to the full 6-dimensional Lorentz symmetry, and is…
▽ More
Many current approaches to machine learning in particle physics use generic architectures that require large numbers of parameters and disregard underlying physics principles, limiting their applicability as scientific modeling tools. In this work, we present a machine learning architecture that uses a set of inputs maximally reduced with respect to the full 6-dimensional Lorentz symmetry, and is fully permutation-equivariant throughout. We study the application of this network architecture to the standard task of top quark tagging and show that the resulting network outperforms all existing competitors despite much lower model complexity. In addition, we present a Lorentz-covariant variant of the same network applied to a 4-momentum regression task.
△ Less
Submitted 23 December, 2022; v1 submitted 1 November, 2022;
originally announced November 2022.
-
Symmetry Group Equivariant Architectures for Physics
Authors:
Alexander Bogatskiy,
Sanmay Ganguly,
Thomas Kipf,
Risi Kondor,
David W. Miller,
Daniel Murnane,
Jan T. Offermann,
Mariel Pettee,
Phiala Shanahan,
Chase Shimmin,
Savannah Thais
Abstract:
Physical theories grounded in mathematical symmetries are an essential component of our understanding of a wide range of properties of the universe. Similarly, in the domain of machine learning, an awareness of symmetries such as rotation or permutation invariance has driven impressive performance breakthroughs in computer vision, natural language processing, and other important applications. In t…
▽ More
Physical theories grounded in mathematical symmetries are an essential component of our understanding of a wide range of properties of the universe. Similarly, in the domain of machine learning, an awareness of symmetries such as rotation or permutation invariance has driven impressive performance breakthroughs in computer vision, natural language processing, and other important applications. In this report, we argue that both the physics community and the broader machine learning community have much to understand and potentially to gain from a deeper investment in research concerning symmetry group equivariant machine learning architectures. For some applications, the introduction of symmetries into the fundamental structural design can yield models that are more economical (i.e. contain fewer, but more expressive, learned parameters), interpretable (i.e. more explainable or directly mappable to physical quantities), and/or trainable (i.e. more efficient in both data and computational requirements). We discuss various figures of merit for evaluating these models as well as some potential benefits and limitations of these methods for a variety of physics applications. Research and investment into these approaches will lay the foundation for future architectures that are potentially more robust under new computational paradigms and will provide a richer description of the physical systems to which they are applied.
△ Less
Submitted 11 March, 2022;
originally announced March 2022.
-
Lorentz Group Equivariant Neural Network for Particle Physics
Authors:
Alexander Bogatskiy,
Brandon Anderson,
Jan T. Offermann,
Marwah Roussi,
David W. Miller,
Risi Kondor
Abstract:
We present a neural network architecture that is fully equivariant with respect to transformations under the Lorentz group, a fundamental symmetry of space and time in physics. The architecture is based on the theory of the finite-dimensional representations of the Lorentz group and the equivariant nonlinearity involves the tensor product. For classification tasks in particle physics, we demonstra…
▽ More
We present a neural network architecture that is fully equivariant with respect to transformations under the Lorentz group, a fundamental symmetry of space and time in physics. The architecture is based on the theory of the finite-dimensional representations of the Lorentz group and the equivariant nonlinearity involves the tensor product. For classification tasks in particle physics, we demonstrate that such an equivariant architecture leads to drastically simpler models that have relatively few learnable parameters and are much more physically interpretable than leading approaches that use CNNs and point cloud approaches. The competitive performance of the network is demonstrated on a public classification dataset [27] for tagging top quark decays given energy-momenta of jet constituents produced in proton-proton collisions.
△ Less
Submitted 8 June, 2020;
originally announced June 2020.
-
Vortex flows on closed surfaces
Authors:
A. Bogatskiy
Abstract:
We investigate the bulk hydrodynamics of the chiral vortex matter on an arbitrary closed surface, extending the ideas of [20, 41]. Placing this important example of a chiral medium onto a curved geometry reveals the geometric nature of odd viscosity. The anomalous odd viscosity of the vortex matter is associated with a special interaction of point vortices with curvature.
We investigate the bulk hydrodynamics of the chiral vortex matter on an arbitrary closed surface, extending the ideas of [20, 41]. Placing this important example of a chiral medium onto a curved geometry reveals the geometric nature of odd viscosity. The anomalous odd viscosity of the vortex matter is associated with a special interaction of point vortices with curvature.
△ Less
Submitted 18 March, 2019;
originally announced March 2019.
-
Edge wave and boundary layer of vortex matter
Authors:
Alexander Bogatskiy,
Paul Wiegmann
Abstract:
We show that a vortex matter, that is a dense assembly of vortices in an incompressible two-dimensional flow, such as a fast rotating superfluid or turbulent flows with sign-like eddies, exhibits (i) a boundary layer of vorticity (vorticity layer), and (ii) a nonlinear wave localized within the vorticity layer, the edge wave. Both are solely an effect of the topological nature of vortices. Both ar…
▽ More
We show that a vortex matter, that is a dense assembly of vortices in an incompressible two-dimensional flow, such as a fast rotating superfluid or turbulent flows with sign-like eddies, exhibits (i) a boundary layer of vorticity (vorticity layer), and (ii) a nonlinear wave localized within the vorticity layer, the edge wave. Both are solely an effect of the topological nature of vortices. Both are lost if the vortex matter is approximated as a continuous vorticity patch. The edge wave is governed by the integrable Benjamin-Davis-Ono equation exhibiting solitons with a quantized total vorticity. Quantized solitons reveal the topological nature of the vortices through their dynamics. The edge wave and the vorticity layer are due to odd viscosity of the vortex matter. We also identify the dynamics with the action of the Virasoro-Bott group of diffeomorphisms of the circle, where odd viscosity parametrizes the central extension. Our edge wave is a hydrodynamic analog of the edge states of the fractional quantum Hall effect.
△ Less
Submitted 18 February, 2019; v1 submitted 3 December, 2018;
originally announced December 2018.
-
Hankel determinant and orthogonal polynomials for a Gaussian weight with a discontinuity at the edge
Authors:
Alexander Bogatskiy,
Tom Claeys,
Alexander Its
Abstract:
We compute asymptotics for Hankel determinants and orthogonal polynomials with respect to a discontinuous Gaussian weight, in a critical regime where the discontinuity is close to the edge of the associated equilibrium measure support. Their behavior is described in terms of the Ablowitz-Segur family of solutions to the Painlevé II equation. Our results complement the ones in [Xu,Zhao,2011]. As co…
▽ More
We compute asymptotics for Hankel determinants and orthogonal polynomials with respect to a discontinuous Gaussian weight, in a critical regime where the discontinuity is close to the edge of the associated equilibrium measure support. Their behavior is described in terms of the Ablowitz-Segur family of solutions to the Painlevé II equation. Our results complement the ones in [Xu,Zhao,2011]. As consequences of our results, we conjecture asymptotics for an Airy kernel Fredholm determinant and total integral identities for Painlevé II transcendents, and we also prove a new result on the poles of the Ablowitz-Segur solutions to the Painlevé II equation. We also highlight applications of our results in random matrix theory.
△ Less
Submitted 26 March, 2016; v1 submitted 7 July, 2015;
originally announced July 2015.