-
Koopman Learning with Episodic Memory
Authors:
William T. Redman,
Dean Huang,
Maria Fonoberova,
Igor Mezić
Abstract:
Koopman operator theory has found significant success in learning models of complex, real-world dynamical systems, enabling prediction and control. The greater interpretability and lower computational costs of these models, compared to traditional machine learning methodologies, make Koopman learning an especially appealing approach. Despite this, little work has been performed on endowing Koopman…
▽ More
Koopman operator theory has found significant success in learning models of complex, real-world dynamical systems, enabling prediction and control. The greater interpretability and lower computational costs of these models, compared to traditional machine learning methodologies, make Koopman learning an especially appealing approach. Despite this, little work has been performed on endowing Koopman learning with the ability to leverage its own failures. To address this, we equip Koopman methods - developed for predicting non-autonomous time-series - with an episodic memory mechanism, enabling global recall of (or attention to) periods in time where similar dynamics previously occurred. We find that a basic implementation of Koopman learning with episodic memory leads to significant improvements in prediction on synthetic and real-world data. Our framework has considerable potential for expansion, allowing for future advances, and opens exciting new directions for Koopman learning.
△ Less
Submitted 18 October, 2024; v1 submitted 21 November, 2023;
originally announced November 2023.
-
Identifying Equivalent Training Dynamics
Authors:
William T. Redman,
Juan M. Bello-Rivas,
Maria Fonoberova,
Ryan Mohr,
Ioannis G. Kevrekidis,
Igor Mezić
Abstract:
Study of the nonlinear evolution deep neural network (DNN) parameters undergo during training has uncovered regimes of distinct dynamical behavior. While a detailed understanding of these phenomena has the potential to advance improvements in training efficiency and robustness, the lack of methods for identifying when DNN models have equivalent dynamics limits the insight that can be gained from p…
▽ More
Study of the nonlinear evolution deep neural network (DNN) parameters undergo during training has uncovered regimes of distinct dynamical behavior. While a detailed understanding of these phenomena has the potential to advance improvements in training efficiency and robustness, the lack of methods for identifying when DNN models have equivalent dynamics limits the insight that can be gained from prior work. Topological conjugacy, a notion from dynamical systems theory, provides a precise definition of dynamical equivalence, offering a possible route to address this need. However, topological conjugacies have historically been challenging to compute. By leveraging advances in Koopman operator theory, we develop a framework for identifying conjugate and non-conjugate training dynamics. To validate our approach, we demonstrate that it can correctly identify a known equivalence between online mirror descent and online gradient descent. We then utilize it to: identify non-conjugate training dynamics between shallow and wide fully connected neural networks; characterize the early phase of training dynamics in convolutional neural networks; uncover non-conjugate training dynamics in Transformers that do and do not undergo grokking. Our results, across a range of DNN architectures, illustrate the flexibility of our framework and highlight its potential for shedding new light on training dynamics.
△ Less
Submitted 4 June, 2024; v1 submitted 17 February, 2023;
originally announced February 2023.
-
Algorithmic (Semi-)Conjugacy via Koopman Operator Theory
Authors:
William T. Redman,
Maria Fonoberova,
Ryan Mohr,
Ioannis G. Kevrekidis,
Igor Mezić
Abstract:
Iterative algorithms are of utmost importance in decision and control. With an ever growing number of algorithms being developed, distributed, and proprietarized, there is a similarly growing need for methods that can provide classification and comparison. By viewing iterative algorithms as discrete-time dynamical systems, we leverage Koopman operator theory to identify (semi-)conjugacies between…
▽ More
Iterative algorithms are of utmost importance in decision and control. With an ever growing number of algorithms being developed, distributed, and proprietarized, there is a similarly growing need for methods that can provide classification and comparison. By viewing iterative algorithms as discrete-time dynamical systems, we leverage Koopman operator theory to identify (semi-)conjugacies between algorithms using their spectral properties. This provides a general framework with which to classify and compare algorithms.
△ Less
Submitted 13 September, 2022;
originally announced September 2022.
-
An Operator Theoretic View on Pruning Deep Neural Networks
Authors:
William T. Redman,
Maria Fonoberova,
Ryan Mohr,
Ioannis G. Kevrekidis,
Igor Mezic
Abstract:
The discovery of sparse subnetworks that are able to perform as well as full models has found broad applied and theoretical interest. While many pruning methods have been developed to this end, the naïve approach of removing parameters based on their magnitude has been found to be as robust as more complex, state-of-the-art algorithms. The lack of theory behind magnitude pruning's success, especia…
▽ More
The discovery of sparse subnetworks that are able to perform as well as full models has found broad applied and theoretical interest. While many pruning methods have been developed to this end, the naïve approach of removing parameters based on their magnitude has been found to be as robust as more complex, state-of-the-art algorithms. The lack of theory behind magnitude pruning's success, especially pre-convergence, and its relation to other pruning methods, such as gradient based pruning, are outstanding open questions in the field that are in need of being addressed. We make use of recent advances in dynamical systems theory, namely Koopman operator theory, to define a new class of theoretically motivated pruning algorithms. We show that these algorithms can be equivalent to magnitude and gradient based pruning, unifying these seemingly disparate methods, and find that they can be used to shed light on magnitude pruning's performance during the early part of training.
△ Less
Submitted 12 March, 2022; v1 submitted 27 October, 2021;
originally announced October 2021.
-
Universality of Winning Tickets: A Renormalization Group Perspective
Authors:
William T. Redman,
Tianlong Chen,
Zhangyang Wang,
Akshunna S. Dogra
Abstract:
Foundational work on the Lottery Ticket Hypothesis has suggested an exciting corollary: winning tickets found in the context of one task can be transferred to similar tasks, possibly even across different architectures. This has generated broad interest, but methods to study this universality are lacking. We make use of renormalization group theory, a powerful tool from theoretical physics, to add…
▽ More
Foundational work on the Lottery Ticket Hypothesis has suggested an exciting corollary: winning tickets found in the context of one task can be transferred to similar tasks, possibly even across different architectures. This has generated broad interest, but methods to study this universality are lacking. We make use of renormalization group theory, a powerful tool from theoretical physics, to address this need. We find that iterative magnitude pruning, the principal algorithm used for discovering winning tickets, is a renormalization group scheme, and can be viewed as inducing a flow in parameter space. We demonstrate that ResNet-50 models with transferable winning tickets have flows with common properties, as would be expected from the theory. Similar observations are made for BERT models, with evidence that their flows are near fixed points. Additionally, we leverage our framework to study winning tickets transferred across ResNet architectures, observing that smaller models have flows with more uniform properties than larger models, complicating transfer between them.
△ Less
Submitted 15 June, 2022; v1 submitted 7 October, 2021;
originally announced October 2021.
-
On Koopman Mode Decomposition and Tensor Component Analysis
Authors:
William T. Redman
Abstract:
Koopman mode decomposition and tensor component analysis (also known as CANDECOMP/PARAFAC or canonical polyadic decomposition) are two popular approaches of decomposing high dimensional data sets into low dimensional modes that capture the most relevant features and/or dynamics. Despite their similar goal, the two methods are largely used by different scientific communities and formulated in disti…
▽ More
Koopman mode decomposition and tensor component analysis (also known as CANDECOMP/PARAFAC or canonical polyadic decomposition) are two popular approaches of decomposing high dimensional data sets into low dimensional modes that capture the most relevant features and/or dynamics. Despite their similar goal, the two methods are largely used by different scientific communities and formulated in distinct mathematical languages. We examine the two together and show that, under a certain (reasonable) condition on the data, the theoretical decomposition given by tensor component analysis is the \textit{same} as that given by Koopman mode decomposition. This provides a "bridge" with which the two communities should be able to more effectively communicate. When this condition is not met, Koopman mode decomposition still provides a tensor decomposition with an \textit{a priori} computable error, providing an alternative to the non-convex optimization that tensor component analysis requires. Our work provides new possibilities for algorithmic approaches to Koopman mode decomposition and tensor component analysis, provides a new perspective on the success of tensor component analysis, and builds upon a growing body of work showing that dynamical systems, and Koopman operator theory in particular, can be useful for problems that have historically made use of optimization theory.
△ Less
Submitted 15 April, 2021; v1 submitted 2 January, 2021;
originally announced January 2021.
-
Local error quantification for Neural Network Differential Equation solvers
Authors:
Akshunna S. Dogra,
William T Redman
Abstract:
Neural networks have been identified as powerful tools for the study of complex systems. A noteworthy example is the neural network differential equation (NN DE) solver, which can provide functional approximations to the solutions of a wide variety of differential equations. Such solvers produce robust functional expressions, are well suited for further manipulations on the quantities of interest…
▽ More
Neural networks have been identified as powerful tools for the study of complex systems. A noteworthy example is the neural network differential equation (NN DE) solver, which can provide functional approximations to the solutions of a wide variety of differential equations. Such solvers produce robust functional expressions, are well suited for further manipulations on the quantities of interest (for example, taking derivatives), and capable of leveraging the modern advances in parallelization and computing power. However, there is a lack of work on the role precise error quantification can play in their predictions: usually, the focus is on ambiguous and/or global measures of performance like the loss function and/or obtaining global bounds on the errors associated with the predictions. Precise, local error quantification is seldom possible without external means or outright knowledge of the true solution. We address these concerns in the context of dynamical system NN DE solvers, leveraging learnt information within the NN DE solvers to develop methods that allow them to be more accurate and efficient, while still pursuing an unsupervised approach that does not rely on external tools or data. We achieve this via methods that can precisely estimate NN DE solver prediction errors point-wise, thus allowing the user the capacity for efficient and targeted error correction. We exemplify the utility of our methods by testing them on a nonlinear and a chaotic system each.
△ Less
Submitted 28 January, 2021; v1 submitted 24 August, 2020;
originally announced August 2020.
-
Optimizing Neural Networks via Koopman Operator Theory
Authors:
Akshunna S. Dogra,
William T Redman
Abstract:
Koopman operator theory, a powerful framework for discovering the underlying dynamics of nonlinear dynamical systems, was recently shown to be intimately connected with neural network training. In this work, we take the first steps in making use of this connection. As Koopman operator theory is a linear theory, a successful implementation of it in evolving network weights and biases offers the pro…
▽ More
Koopman operator theory, a powerful framework for discovering the underlying dynamics of nonlinear dynamical systems, was recently shown to be intimately connected with neural network training. In this work, we take the first steps in making use of this connection. As Koopman operator theory is a linear theory, a successful implementation of it in evolving network weights and biases offers the promise of accelerated training, especially in the context of deep networks, where optimization is inherently a non-convex problem. We show that Koopman operator theoretic methods allow for accurate predictions of weights and biases of feedforward, fully connected deep networks over a non-trivial range of training time. During this window, we find that our approach is >10x faster than various gradient descent based methods (e.g. Adam, Adadelta, Adagrad), in line with our complexity analysis. We end by highlighting open questions in this exciting intersection between dynamical systems and neural network theory. We highlight additional methods by which our results could be expanded to broader classes of networks and larger training intervals, which shall be the focus of future work.
△ Less
Submitted 21 October, 2020; v1 submitted 3 June, 2020;
originally announced June 2020.
-
Renormalization Group as a Koopman Operator
Authors:
William T Redman
Abstract:
Koopman operator theory is shown to be directly related to the renormalization group. This observation allows us, with no assumption of translational invariance, to compute the critical exponents $η$ and $δ$, as well as ratios of critical exponents, of classical spin systems from single observables alone. This broadens the types of problems that the renormalization group framework can be applied t…
▽ More
Koopman operator theory is shown to be directly related to the renormalization group. This observation allows us, with no assumption of translational invariance, to compute the critical exponents $η$ and $δ$, as well as ratios of critical exponents, of classical spin systems from single observables alone. This broadens the types of problems that the renormalization group framework can be applied to and establish universality classes of. In addition, this connection may allow for a new, data-driven way in which to find the renormalization group fixed point(s), and their relevant and irrelevant directions.
△ Less
Submitted 8 June, 2020; v1 submitted 30 December, 2019;
originally announced December 2019.
-
An O(n) method of calculating Kendall correlations of spike trains
Authors:
William T Redman
Abstract:
The ability to record from increasingly large numbers of neurons, and the increasing attention being paid to large scale neural network simulations, demands computationally fast algorithms to compute relevant statistical measures. We present an O(n) algorithm for calculating the Kendall correlation of spike trains, a correlation measure that is becoming especially recognized as an important tool i…
▽ More
The ability to record from increasingly large numbers of neurons, and the increasing attention being paid to large scale neural network simulations, demands computationally fast algorithms to compute relevant statistical measures. We present an O(n) algorithm for calculating the Kendall correlation of spike trains, a correlation measure that is becoming especially recognized as an important tool in neuroscience. We show that our method is around 50 times faster than the O (n ln n) method which is a current standard for quickly computing the Kendall correlation. In addition to providing a faster algorithm, we emphasize the role that taking the specific nature of spike trains had on reducing the run time. We imagine that there are many other useful algorithms that can be even more significantly sped up when taking this into consideration. A MATLAB function executing the method described here has been made freely available on-line.
△ Less
Submitted 25 February, 2020; v1 submitted 7 June, 2018;
originally announced June 2018.
-
A General Approach to Coding in Early Olfactory and Visual Neural Populations
Authors:
William T Redman
Abstract:
Recent experimental and theoretical work on neural populations belonging to two separate early sensory systems, olfaction and vision, has challenged the notion that the two operate under different computational paradigms by providing evidence for the respective neural population codes having three central, common features: they are highly redundant; they are organized such that information is carr…
▽ More
Recent experimental and theoretical work on neural populations belonging to two separate early sensory systems, olfaction and vision, has challenged the notion that the two operate under different computational paradigms by providing evidence for the respective neural population codes having three central, common features: they are highly redundant; they are organized such that information is carried in the identity, and not the relative timing, of the active neurons; they are capable of error correction. We present the first model that captures these three properties in a general manner, making it possible to investigate whether similar structure is present in other population codes. Our model also makes specific predictions about additional, as yet unseen, structure in such codes. If these predictions are found in real data, this would provide new evidence that such population codes are operating under more general computational principles.
△ Less
Submitted 12 August, 2018; v1 submitted 9 October, 2017;
originally announced October 2017.