subscribe to arXiv mailings

Designing Mechanical Meta-Materials by Learning Equivariant Flows

Authors: Mehran Mirramezani, Anne S. Meeussen, Katia Bertoldi, Peter Orbanz, Ryan P. Adams

Abstract: Mechanical meta-materials are solids whose geometric structure results in exotic nonlinear behaviors that are not typically achievable via homogeneous materials. We show how to drastically expand the design space of a class of mechanical meta-materials known as cellular solids, by generalizing beyond translational symmetry. This is made possible by transforming a reference geometry according to a… ▽ More Mechanical meta-materials are solids whose geometric structure results in exotic nonlinear behaviors that are not typically achievable via homogeneous materials. We show how to drastically expand the design space of a class of mechanical meta-materials known as cellular solids, by generalizing beyond translational symmetry. This is made possible by transforming a reference geometry according to a divergence free flow that is parameterized by a neural network and equivariant under the relevant symmetry group. We show how to construct flows equivariant to the space groups, despite the fact that these groups are not compact. Coupling this flow with a differentiable nonlinear mechanics simulator allows us to represent a much richer set of cellular solids than was previously possible. These materials can be optimized to exhibit desirable mechanical properties such as negative Poisson's ratios or to match target stress-strain curves. We validate these new designs in simulation and by fabricating real-world prototypes. We find that designs with higher-order symmetries can exhibit a wider range of behaviors. △ Less

Submitted 3 October, 2024; originally announced October 2024.

arXiv:2409.02606 [pdf, other]

Real-time design of architectural structures with differentiable mechanics and neural networks

Authors: Rafael Pastrana, Eder Medina, Isabel M. de Oliveira, Sigrid Adriaenssens, Ryan P. Adams

Abstract: Designing mechanically efficient geometry for architectural structures like shells, towers, and bridges is an expensive iterative process. Existing techniques for solving such inverse mechanical problems rely on traditional direct optimization methods, which are slow and computationally expensive, limiting iteration speed and design exploration. Neural networks would seem to offer a solution, via… ▽ More Designing mechanically efficient geometry for architectural structures like shells, towers, and bridges is an expensive iterative process. Existing techniques for solving such inverse mechanical problems rely on traditional direct optimization methods, which are slow and computationally expensive, limiting iteration speed and design exploration. Neural networks would seem to offer a solution, via data-driven amortized optimization for specific design tasks, but they often require extensive fine-tuning and cannot ensure that important design criteria, such as mechanical integrity, are met. In this work, we combine neural networks with a differentiable mechanics simulator to develop a model that accelerates the solution of shape approximation problems for architectural structures modeled as bar systems. As a result, our model offers explicit guarantees to satisfy mechanical constraints while generating designs that match target geometries. We validate our model in two tasks, the design of masonry shells and cable-net towers. Our model achieves better accuracy and generalization than fully neural alternatives, and comparable accuracy to direct optimization but in real time, enabling fast and sound design exploration. We further demonstrate the real-world potential of our trained model by deploying it in 3D modeling software and by fabricating a physical prototype. Our work opens up new opportunities for accelerated physical design enhanced by neural networks for the built environment. △ Less

Submitted 3 October, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

arXiv:2407.00494 [pdf, other]

Graph Neural Networks Gone Hogwild

Authors: Olga Solodova, Nick Richardson, Deniz Oktay, Ryan P. Adams

Abstract: Message passing graph neural networks (GNNs) would appear to be powerful tools to learn distributed algorithms via gradient descent, but generate catastrophically incorrect predictions when nodes update asynchronously during inference. This failure under asynchrony effectively excludes these architectures from many potential applications, such as learning local communication policies between resou… ▽ More Message passing graph neural networks (GNNs) would appear to be powerful tools to learn distributed algorithms via gradient descent, but generate catastrophically incorrect predictions when nodes update asynchronously during inference. This failure under asynchrony effectively excludes these architectures from many potential applications, such as learning local communication policies between resource-constrained agents in, e.g., robotic swarms or sensor networks. In this work we explore why this failure occurs in common GNN architectures, and identify "implicitly-defined" GNNs as a class of architectures which is provably robust to partially asynchronous "hogwild" inference, adapting convergence guarantees from work in asynchronous and distributed optimization, e.g., Bertsekas (1982); Niu et al. (2011). We then propose a novel implicitly-defined GNN architecture, which we call an energy GNN. We show that this architecture outperforms other GNNs from this class on a variety of synthetic tasks inspired by multi-agent systems, and achieves competitive performance on real-world datasets. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2403.08078 [pdf, other]

Automated discovery of reprogrammable nonlinear dynamic metamaterials

Authors: Giovanni Bordiga, Eder Medina, Sina Jafarzadeh, Cyrill Boesch, Ryan P. Adams, Vincent Tournat, Katia Bertoldi

Abstract: Harnessing the rich nonlinear dynamics of highly-deformable materials has the potential to unlock the next generation of functional smart materials and devices. However, unlocking such potential requires effective strategies to spatially design optimal material architectures for desired nonlinear dynamic responses such as guiding of nonlinear elastic waves, energy focusing, and cloaking. Here, we… ▽ More Harnessing the rich nonlinear dynamics of highly-deformable materials has the potential to unlock the next generation of functional smart materials and devices. However, unlocking such potential requires effective strategies to spatially design optimal material architectures for desired nonlinear dynamic responses such as guiding of nonlinear elastic waves, energy focusing, and cloaking. Here, we introduce an inverse-design framework for the discovery of flexible mechanical metamaterials with a target nonlinear dynamic response. The desired dynamic task is encoded via optimal tuning of the full-scale metamaterial geometry through an inverse-design approach powered by a custom-developed fully-differentiable simulation environment. By deploying such strategy, we design mechanical metamaterials tailored for energy focusing, energy splitting, dynamic protection, and nonlinear motion conversion. Furthermore, we illustrate that our design framework can be expanded to automatically discover reprogrammable architectures capable of switching between different dynamic tasks. For instance, we encode two strongly competing tasks -- energy focusing and dynamic protection -- within a single architecture, utilizing static pre-compression to switch between these behaviors. The discovered designs are physically realized and experimentally tested, demonstrating the robustness of the engineered tasks. All together, our approach opens an untapped avenue towards designer materials with tailored robotic-like reprogrammable functionalities. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: 8 pages, 5 figures

arXiv:2310.19798 [pdf, other]

doi 10.1145/3623263.3623364

Gradient-Based Dovetail Joint Shape Optimization for Stiffness

Authors: Xingyuan Sun, Chenyue Cai, Ryan P. Adams, Szymon Rusinkiewicz

Abstract: It is common to manufacture an object by decomposing it into parts that can be assembled. This decomposition is often required by size limits of the machine, the complex structure of the shape, etc. To make it possible to easily assemble the final object, it is often desirable to design geometry that enables robust connections between the subcomponents. In this project, we study the task of doveta… ▽ More It is common to manufacture an object by decomposing it into parts that can be assembled. This decomposition is often required by size limits of the machine, the complex structure of the shape, etc. To make it possible to easily assemble the final object, it is often desirable to design geometry that enables robust connections between the subcomponents. In this project, we study the task of dovetail-joint shape optimization for stiffness using gradient-based optimization. This optimization requires a differentiable simulator that is capable of modeling the contact between the two parts of a joint, making it possible to reason about the gradient of the stiffness with respect to shape parameters. Our simulation approach uses a penalty method that alternates between optimizing each side of the joint, using the adjoint method to compute gradients. We test our method by optimizing the joint shapes in three different joint shape spaces, and evaluate optimized joint shapes in both simulation and real-world tests. The experiments show that optimized joint shapes achieve higher stiffness, both synthetically and in real-world tests. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: ACM SCF 2023: Proceedings of the 8th Annual ACM Symposium on Computational Fabrication

arXiv:2310.12920 [pdf, other]

Generative Marginalization Models

Authors: Sulin Liu, Peter J. Ramadge, Ryan P. Adams

Abstract: We introduce marginalization models (MAMs), a new family of generative models for high-dimensional discrete data. They offer scalable and flexible generative modeling by explicitly modeling all induced marginal distributions. Marginalization models enable fast approximation of arbitrary marginal probabilities with a single forward pass of the neural network, which overcomes a major limitation of a… ▽ More We introduce marginalization models (MAMs), a new family of generative models for high-dimensional discrete data. They offer scalable and flexible generative modeling by explicitly modeling all induced marginal distributions. Marginalization models enable fast approximation of arbitrary marginal probabilities with a single forward pass of the neural network, which overcomes a major limitation of arbitrary marginal inference models, such as any-order autoregressive models. MAMs also address the scalability bottleneck encountered in training any-order generative models for high-dimensional problems under the context of energy-based training, where the goal is to match the learned distribution to a given desired probability (specified by an unnormalized log-probability function such as energy or reward function). We propose scalable methods for learning the marginals, grounded in the concept of "marginalization self-consistency". We demonstrate the effectiveness of the proposed model on a variety of discrete data distributions, including images, text, physical systems, and molecules, for maximum likelihood and energy-based training settings. MAMs achieve orders of magnitude speedup in evaluating the marginal probabilities on both settings. For energy-based training tasks, MAMs enable any-order generative modeling of high-dimensional problems beyond the scale of previous methods. Code is available at https://github.com/PrincetonLIPS/MaM. △ Less

Submitted 6 October, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

Comments: ICML 2024

arXiv:2309.04970 [pdf, other]

A rapid and automated computational approach to the design of multistable soft actuators

Authors: Mehran Mirramezani, Deniz Oktay, Ryan P. Adams

Abstract: We develop an automated computational modeling framework for rapid gradient-based design of multistable soft mechanical structures composed of non-identical bistable unit cells with appropriate geometric parameterization. This framework includes a custom isogeometric analysis-based continuum mechanics solver that is robust and end-to-end differentiable, which enables geometric and material optimiz… ▽ More We develop an automated computational modeling framework for rapid gradient-based design of multistable soft mechanical structures composed of non-identical bistable unit cells with appropriate geometric parameterization. This framework includes a custom isogeometric analysis-based continuum mechanics solver that is robust and end-to-end differentiable, which enables geometric and material optimization to achieve a desired multistability pattern. We apply this numerical modeling approach in two dimensions to design a variety of multistable structures, accounting for various geometric and material constraints. Our framework demonstrates consistent agreement with experimental results, and robust performance in designing for multistability, which facilities soft actuator design with high precision and reliability. △ Less

Submitted 10 September, 2023; originally announced September 2023.

arXiv:2307.12407 [pdf, other]

JAX FDM: A differentiable solver for inverse form-finding

Authors: Rafael Pastrana, Deniz Oktay, Ryan P. Adams, Sigrid Adriaenssens

Abstract: We introduce JAX FDM, a differentiable solver to design mechanically efficient shapes for 3D structures conditioned on target architectural, fabrication and structural properties. Examples of such structures are domes, cable nets and towers. JAX FDM solves these inverse form-finding problems by combining the force density method, differentiable sparsity and gradient-based optimization. Our solver… ▽ More We introduce JAX FDM, a differentiable solver to design mechanically efficient shapes for 3D structures conditioned on target architectural, fabrication and structural properties. Examples of such structures are domes, cable nets and towers. JAX FDM solves these inverse form-finding problems by combining the force density method, differentiable sparsity and gradient-based optimization. Our solver can be paired with other libraries in the JAX ecosystem to facilitate the integration of form-finding simulations with neural networks. We showcase the features of JAX FDM with two design examples. JAX FDM is available as an open-source library at https://github.com/arpastrana/jax_fdm. △ Less

Submitted 28 October, 2023; v1 submitted 23 July, 2023; originally announced July 2023.

Comments: https://github.com/arpastrana/jax_fdm

Journal ref: 40th International Conference on Machine Learning (2023: Differentiable Almost Everything Workshop

arXiv:2306.05261 [pdf, other]

Representing and Learning Functions Invariant Under Crystallographic Groups

Authors: Ryan P. Adams, Peter Orbanz

Abstract: Crystallographic groups describe the symmetries of crystals and other repetitive structures encountered in nature and the sciences. These groups include the wallpaper and space groups. We derive linear and nonlinear representations of functions that are (1) smooth and (2) invariant under such a group. The linear representation generalizes the Fourier basis to crystallographically invariant basis f… ▽ More Crystallographic groups describe the symmetries of crystals and other repetitive structures encountered in nature and the sciences. These groups include the wallpaper and space groups. We derive linear and nonlinear representations of functions that are (1) smooth and (2) invariant under such a group. The linear representation generalizes the Fourier basis to crystallographically invariant basis functions. We show that such a basis exists for each crystallographic group, that it is orthonormal in the relevant $L_2$ space, and recover the standard Fourier basis as a special case for pure shift groups. The nonlinear representation embeds the orbit space of the group into a finite-dimensional Euclidean space. We show that such an embedding exists for every crystallographic group, and that it factors functions through a generalization of a manifold called an orbifold. We describe algorithms that, given a standardized description of the group, compute the Fourier basis and an embedding map. As examples, we construct crystallographically invariant neural networks, kernel machines, and Gaussian processes. △ Less

Submitted 8 June, 2023; originally announced June 2023.

arXiv:2302.00032 [pdf, other]

Neuromechanical Autoencoders: Learning to Couple Elastic and Neural Network Nonlinearity

Authors: Deniz Oktay, Mehran Mirramezani, Eder Medina, Ryan P. Adams

Abstract: Intelligent biological systems are characterized by their embodiment in a complex environment and the intimate interplay between their nervous systems and the nonlinear mechanical properties of their bodies. This coordination, in which the dynamics of the motor system co-evolved to reduce the computational burden on the brain, is referred to as ``mechanical intelligence'' or ``morphological comput… ▽ More Intelligent biological systems are characterized by their embodiment in a complex environment and the intimate interplay between their nervous systems and the nonlinear mechanical properties of their bodies. This coordination, in which the dynamics of the motor system co-evolved to reduce the computational burden on the brain, is referred to as ``mechanical intelligence'' or ``morphological computation''. In this work, we seek to develop machine learning analogs of this process, in which we jointly learn the morphology of complex nonlinear elastic solids along with a deep neural network to control it. By using a specialized differentiable simulator of elastic mechanics coupled to conventional deep learning architectures -- which we refer to as neuromechanical autoencoders -- we are able to learn to perform morphological computation via gradient descent. Key to our approach is the use of mechanical metamaterials -- cellular solids, in particular -- as the morphological substrate. Just as deep neural networks provide flexible and massively-parametric function approximators for perceptual and control tasks, cellular solid metamaterials are promising as a rich and learnable space for approximating a variety of actuation tasks. In this work we take advantage of these complementary computational concepts to co-design materials and neural network controls to achieve nonintuitive mechanical behavior. We demonstrate in simulation how it is possible to achieve translation, rotation, and shape matching, as well as a ``digital MNIST'' task. We additionally manufacture and evaluate one of the designs to verify its real-world behavior. △ Less

Submitted 31 January, 2023; originally announced February 2023.

Comments: ICLR 2023 Spotlight

arXiv:2211.01604 [pdf, other]

Meta-PDE: Learning to Solve PDEs Quickly Without a Mesh

Authors: Tian Qin, Alex Beatson, Deniz Oktay, Nick McGreivy, Ryan P. Adams

Abstract: Partial differential equations (PDEs) are often computationally challenging to solve, and in many settings many related PDEs must be be solved either at every timestep or for a variety of candidate boundary conditions, parameters, or geometric domains. We present a meta-learning based method which learns to rapidly solve problems from a distribution of related PDEs. We use meta-learning (MAML and… ▽ More Partial differential equations (PDEs) are often computationally challenging to solve, and in many settings many related PDEs must be be solved either at every timestep or for a variety of candidate boundary conditions, parameters, or geometric domains. We present a meta-learning based method which learns to rapidly solve problems from a distribution of related PDEs. We use meta-learning (MAML and LEAP) to identify initializations for a neural network representation of the PDE solution such that a residual of the PDE can be quickly minimized on a novel task. We apply our meta-solving approach to a nonlinear Poisson's equation, 1D Burgers' equation, and hyperelasticity equations with varying parameters, geometries, and boundary conditions. The resulting Meta-PDE method finds qualitatively accurate solutions to most problems within a few gradient steps; for the nonlinear Poisson and hyper-elasticity equation this results in an intermediate accuracy approximation up to an order of magnitude faster than a baseline finite element analysis (FEA) solver with equivalent accuracy. In comparison to other learned solvers and surrogate models, this meta-learning approach can be trained without supervision from expensive ground-truth data, does not require a mesh, and can even be used when the geometry and topology varies between tasks. △ Less

Submitted 3 November, 2022; originally announced November 2022.

arXiv:2210.01534 [pdf, other]

Multi-fidelity Monte Carlo: a pseudo-marginal approach

Authors: Diana Cai, Ryan P. Adams

Abstract: Markov chain Monte Carlo (MCMC) is an established approach for uncertainty quantification and propagation in scientific applications. A key challenge in applying MCMC to scientific domains is computation: the target density of interest is often a function of expensive computations, such as a high-fidelity physical simulation, an intractable integral, or a slowly-converging iterative algorithm. Thu… ▽ More Markov chain Monte Carlo (MCMC) is an established approach for uncertainty quantification and propagation in scientific applications. A key challenge in applying MCMC to scientific domains is computation: the target density of interest is often a function of expensive computations, such as a high-fidelity physical simulation, an intractable integral, or a slowly-converging iterative algorithm. Thus, using an MCMC algorithms with an expensive target density becomes impractical, as these expensive computations need to be evaluated at each iteration of the algorithm. In practice, these computations often approximated via a cheaper, low-fidelity computation, leading to bias in the resulting target density. Multi-fidelity MCMC algorithms combine models of varying fidelities in order to obtain an approximate target density with lower computational cost. In this paper, we describe a class of asymptotically exact multi-fidelity MCMC algorithms for the setting where a sequence of models of increasing fidelity can be computed that approximates the expensive target density of interest. We take a pseudo-marginal MCMC approach for multi-fidelity inference that utilizes a cheaper, randomized-fidelity unbiased estimator of the target fidelity constructed via random truncation of a telescoping series of the low-fidelity sequence of models. Finally, we discuss and evaluate the proposed multi-fidelity MCMC approach on several applications, including log-Gaussian Cox process modeling, Bayesian ODE system identification, PDE-constrained optimization, and Gaussian process regression parameter inference. △ Less

Submitted 4 October, 2022; originally announced October 2022.

Comments: 22 pages, 7 figures

arXiv:2205.16008 [pdf, other]

doi 10.1145/3623263.3623356

More Stiffness with Less Fiber: End-to-End Fiber Path Optimization for 3D-Printed Composites

Authors: Xingyuan Sun, Geoffrey Roeder, Tianju Xue, Ryan P. Adams, Szymon Rusinkiewicz

Abstract: In 3D printing, stiff fibers (e.g., carbon fiber) can reinforce thermoplastic polymers with limited stiffness. However, existing commercial digital manufacturing software only provides a few simple fiber layout algorithms, which solely use the geometry of the shape. In this work, we build an automated fiber path planning algorithm that maximizes the stiffness of a 3D print given specified external… ▽ More In 3D printing, stiff fibers (e.g., carbon fiber) can reinforce thermoplastic polymers with limited stiffness. However, existing commercial digital manufacturing software only provides a few simple fiber layout algorithms, which solely use the geometry of the shape. In this work, we build an automated fiber path planning algorithm that maximizes the stiffness of a 3D print given specified external loads. We formalize this as an optimization problem: an objective function is designed to measure the stiffness of the object while regularizing certain properties of fiber paths (e.g., smoothness). To initialize each fiber path, we use finite element analysis to calculate the stress field on the object and greedily "walk" in the direction of the stress field. We then apply a gradient-based optimization algorithm that uses the adjoint method to calculate the gradient of stiffness with respect to fiber layout. We compare our approach, in both simulation and real-world experiments, to three baselines: (1) concentric fiber rings generated by Eiger, a leading digital manufacturing software package developed by Markforged, (2) greedy extraction on the simulated stress field (i.e., our method without optimization), and (3) the greedy algorithm on a fiber orientation field calculated by smoothing the simulated stress fields. The results show that objects with fiber paths generated by our algorithm achieve greater stiffness while using less fiber than the baselines--our algorithm improves the Pareto frontier of object stiffness as a function of fiber usage. Ablation studies show that the smoothing regularizer is needed for feasible fiber paths and stability of optimization, and multi-resolution optimization helps reduce the running time compared to single-resolution optimization. △ Less

Submitted 29 October, 2023; v1 submitted 31 May, 2022; originally announced May 2022.

Comments: ACM SCF 2023: Proceedings of the 8th Annual ACM Symposium on Computational Fabrication

arXiv:2112.12210 [pdf, other]

ProBF: Learning Probabilistic Safety Certificates with Barrier Functions

Authors: Athindran Ramesh Kumar, Sulin Liu, Jaime F. Fisac, Ryan P. Adams, Peter J. Ramadge

Abstract: Safety-critical applications require controllers/policies that can guarantee safety with high confidence. The control barrier function is a useful tool to guarantee safety if we have access to the ground-truth system dynamics. In practice, we have inaccurate knowledge of the system dynamics, which can lead to unsafe behaviors due to unmodeled residual dynamics. Learning the residual dynamics with… ▽ More Safety-critical applications require controllers/policies that can guarantee safety with high confidence. The control barrier function is a useful tool to guarantee safety if we have access to the ground-truth system dynamics. In practice, we have inaccurate knowledge of the system dynamics, which can lead to unsafe behaviors due to unmodeled residual dynamics. Learning the residual dynamics with deterministic machine learning models can prevent the unsafe behavior but can fail when the predictions are imperfect. In this situation, a probabilistic learning method that reasons about the uncertainty of its predictions can help provide robust safety margins. In this work, we use a Gaussian process to model the projection of the residual dynamics onto a control barrier function. We propose a novel optimization procedure to generate safe controls that can guarantee safety with high probability. The safety filter is provided with the ability to reason about the uncertainty of the predictions from the GP. We show the efficacy of this method through experiments on Segway and Quadrotor simulations. Our proposed probabilistic approach is able to reduce the number of safety violations significantly as compared to the deterministic approach with a neural network. △ Less

Submitted 23 December, 2021; v1 submitted 22 December, 2021; originally announced December 2021.

Comments: Presented at NeurIPS 2021 workshop - Safe and Robust Control of Uncertain Systems

arXiv:2109.14124 [pdf, other]

Vitruvion: A Generative Model of Parametric CAD Sketches

Authors: Ari Seff, Wenda Zhou, Nick Richardson, Ryan P. Adams

Abstract: Parametric computer-aided design (CAD) tools are the predominant way that engineers specify physical structures, from bicycle pedals to airplanes to printed circuit boards. The key characteristic of parametric CAD is that design intent is encoded not only via geometric primitives, but also by parameterized constraints between the elements. This relational specification can be viewed as the constru… ▽ More Parametric computer-aided design (CAD) tools are the predominant way that engineers specify physical structures, from bicycle pedals to airplanes to printed circuit boards. The key characteristic of parametric CAD is that design intent is encoded not only via geometric primitives, but also by parameterized constraints between the elements. This relational specification can be viewed as the construction of a constraint program, allowing edits to coherently propagate to other parts of the design. Machine learning offers the intriguing possibility of accelerating the design process via generative modeling of these structures, enabling new tools such as autocompletion, constraint inference, and conditional synthesis. In this work, we present such an approach to generative modeling of parametric CAD sketches, which constitute the basic computational building blocks of modern mechanical design. Our model, trained on real-world designs from the SketchGraphs dataset, autoregressively synthesizes sketches as sequences of primitives, with initial coordinates, and constraints that reference back to the sampled primitives. As samples from the model match the constraint graph representation used in standard CAD software, they may be directly imported, solved, and edited according to downstream design tasks. In addition, we condition the model on various contexts, including partial sketches (primers) and images of hand-drawn sketches. Evaluation of the proposed approach demonstrates its ability to synthesize realistic CAD sketches and its potential to aid the mechanical design workflow. △ Less

Submitted 28 April, 2022; v1 submitted 28 September, 2021; originally announced September 2021.

Comments: ICLR camera ready

Journal ref: ICLR 2022

arXiv:2107.06277 [pdf, other]

Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability

Authors: Dibya Ghosh, Jad Rahme, Aviral Kumar, Amy Zhang, Ryan P. Adams, Sergey Levine

Abstract: Generalization is a central challenge for the deployment of reinforcement learning (RL) systems in the real world. In this paper, we show that the sequential structure of the RL problem necessitates new approaches to generalization beyond the well-studied techniques used in supervised learning. While supervised learning methods can generalize effectively without explicitly accounting for epistemic… ▽ More Generalization is a central challenge for the deployment of reinforcement learning (RL) systems in the real world. In this paper, we show that the sequential structure of the RL problem necessitates new approaches to generalization beyond the well-studied techniques used in supervised learning. While supervised learning methods can generalize effectively without explicitly accounting for epistemic uncertainty, we show that, perhaps surprisingly, this is not the case in RL. We show that generalization to unseen test conditions from a limited number of training conditions induces implicit partial observability, effectively turning even fully-observed MDPs into POMDPs. Informed by this observation, we recast the problem of generalization in RL as solving the induced partially observed Markov decision process, which we call the epistemic POMDP. We demonstrate the failure modes of algorithms that do not appropriately handle this partial observability, and suggest a simple ensemble-based technique for approximately solving the partially observed problem. Empirically, we demonstrate that our simple algorithm derived from the epistemic POMDP achieves significant gains in generalization over current methods on the Procgen benchmark suite. △ Less

Submitted 13 July, 2021; originally announced July 2021.

Comments: First two authors contributed equally

arXiv:2106.09019 [pdf, other]

Amortized Synthesis of Constrained Configurations Using a Differentiable Surrogate

Authors: Xingyuan Sun, Tianju Xue, Szymon Rusinkiewicz, Ryan P. Adams

Abstract: In design, fabrication, and control problems, we are often faced with the task of synthesis, in which we must generate an object or configuration that satisfies a set of constraints while maximizing one or more objective functions. The synthesis problem is typically characterized by a physical process in which many different realizations may achieve the goal. This many-to-one map presents challeng… ▽ More In design, fabrication, and control problems, we are often faced with the task of synthesis, in which we must generate an object or configuration that satisfies a set of constraints while maximizing one or more objective functions. The synthesis problem is typically characterized by a physical process in which many different realizations may achieve the goal. This many-to-one map presents challenges to the supervised learning of feed-forward synthesis, as the set of viable designs may have a complex structure. In addition, the non-differentiable nature of many physical simulations prevents efficient direct optimization. We address both of these problems with a two-stage neural network architecture that we may consider to be an autoencoder. We first learn the decoder: a differentiable surrogate that approximates the many-to-one physical realization process. We then learn the encoder, which maps from goal to design, while using the fixed decoder to evaluate the quality of the realization. We evaluate the approach on two case studies: extruder path planning in additive manufacturing and constrained soft robot inverse kinematics. We compare our approach to direct optimization of the design using the learned surrogate, and to supervised learning of the synthesis problem. We find that our approach produces higher quality solutions than supervised learning, while being competitive in quality with direct optimization, at a greatly reduced computational cost. △ Less

Submitted 5 November, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

Comments: NeurIPS 2021, Spotlight. Source code: https://github.com/xingyuansun/amorsyn

arXiv:2103.14224 [pdf, other]

Active multi-fidelity Bayesian online changepoint detection

Authors: Gregory W. Gundersen, Diana Cai, Chuteng Zhou, Barbara E. Engelhardt, Ryan P. Adams

Abstract: Online algorithms for detecting changepoints, or abrupt shifts in the behavior of a time series, are often deployed with limited resources, e.g., to edge computing settings such as mobile phones or industrial sensors. In these scenarios it may be beneficial to trade the cost of collecting an environmental measurement against the quality or "fidelity" of this measurement and how the measurement aff… ▽ More Online algorithms for detecting changepoints, or abrupt shifts in the behavior of a time series, are often deployed with limited resources, e.g., to edge computing settings such as mobile phones or industrial sensors. In these scenarios it may be beneficial to trade the cost of collecting an environmental measurement against the quality or "fidelity" of this measurement and how the measurement affects changepoint estimation. For instance, one might decide between inertial measurements or GPS to determine changepoints for motion. A Bayesian approach to changepoint detection is particularly appealing because we can represent our posterior uncertainty about changepoints and make active, cost-sensitive decisions about data fidelity to reduce this posterior uncertainty. Moreover, the total cost could be dramatically lowered through active fidelity switching, while remaining robust to changes in data distribution. We propose a multi-fidelity approach that makes cost-sensitive decisions about which data fidelity to collect based on maximizing information gain with respect to changepoints. We evaluate this framework on synthetic, video, and audio data and show that this information-based approach results in accurate predictions while reducing total cost. △ Less

Submitted 25 July, 2021; v1 submitted 25 March, 2021; originally announced March 2021.

Comments: 37th Conference on Uncertainty in Artificial Intelligence

arXiv:2007.10412 [pdf, other]

Randomized Automatic Differentiation

Authors: Deniz Oktay, Nick McGreivy, Joshua Aduol, Alex Beatson, Ryan P. Adams

Abstract: The successes of deep learning, variational inference, and many other fields have been aided by specialized implementations of reverse-mode automatic differentiation (AD) to compute gradients of mega-dimensional objectives. The AD techniques underlying these tools were designed to compute exact gradients to numerical precision, but modern machine learning models are almost always trained with stoc… ▽ More The successes of deep learning, variational inference, and many other fields have been aided by specialized implementations of reverse-mode automatic differentiation (AD) to compute gradients of mega-dimensional objectives. The AD techniques underlying these tools were designed to compute exact gradients to numerical precision, but modern machine learning models are almost always trained with stochastic gradient descent. Why spend computation and memory on exact (minibatch) gradients only to use them for stochastic optimization? We develop a general framework and approach for randomized automatic differentiation (RAD), which can allow unbiased gradient estimates to be computed with reduced memory in return for variance. We examine limitations of the general approach, and argue that we must leverage problem specific structure to realize benefits. We develop RAD techniques for a variety of simple neural network architectures, and show that for a fixed memory budget, RAD converges in fewer iterations than using a small batch size for feedforward networks, and in a similar number for recurrent networks. We also show that RAD can be applied to scientific computing, and use it to develop a low-memory stochastic gradient method for optimizing the control parameters of a linear reaction-diffusion PDE representing a fission reactor. △ Less

Submitted 13 March, 2021; v1 submitted 20 July, 2020; originally announced July 2020.

Comments: ICLR 2021

arXiv:2007.08506 [pdf, other]

SketchGraphs: A Large-Scale Dataset for Modeling Relational Geometry in Computer-Aided Design

Authors: Ari Seff, Yaniv Ovadia, Wenda Zhou, Ryan P. Adams

Abstract: Parametric computer-aided design (CAD) is the dominant paradigm in mechanical engineering for physical design. Distinguished by relational geometry, parametric CAD models begin as two-dimensional sketches consisting of geometric primitives (e.g., line segments, arcs) and explicit constraints between them (e.g., coincidence, perpendicularity) that form the basis for three-dimensional construction o… ▽ More Parametric computer-aided design (CAD) is the dominant paradigm in mechanical engineering for physical design. Distinguished by relational geometry, parametric CAD models begin as two-dimensional sketches consisting of geometric primitives (e.g., line segments, arcs) and explicit constraints between them (e.g., coincidence, perpendicularity) that form the basis for three-dimensional construction operations. Training machine learning models to reason about and synthesize parametric CAD designs has the potential to reduce design time and enable new design workflows. Additionally, parametric CAD designs can be viewed as instances of constraint programming and they offer a well-scoped test bed for exploring ideas in program synthesis and induction. To facilitate this research, we introduce SketchGraphs, a collection of 15 million sketches extracted from real-world CAD models coupled with an open-source data processing pipeline. Each sketch is represented as a geometric constraint graph where edges denote designer-imposed geometric relationships between primitives, the nodes of the graph. We demonstrate and establish benchmarks for two use cases of the dataset: generative modeling of sketches and conditional generation of likely constraints given unconstrained geometry. △ Less

Submitted 16 July, 2020; originally announced July 2020.

arXiv:2005.06549 [pdf, other]

Learning Composable Energy Surrogates for PDE Order Reduction

Authors: Alex Beatson, Jordan T. Ash, Geoffrey Roeder, Tianju Xue, Ryan P. Adams

Abstract: Meta-materials are an important emerging class of engineered materials in which complex macroscopic behaviour--whether electromagnetic, thermal, or mechanical--arises from modular substructure. Simulation and optimization of these materials are computationally challenging, as rich substructures necessitate high-fidelity finite element meshes to solve the governing PDEs. To address this, we leverag… ▽ More Meta-materials are an important emerging class of engineered materials in which complex macroscopic behaviour--whether electromagnetic, thermal, or mechanical--arises from modular substructure. Simulation and optimization of these materials are computationally challenging, as rich substructures necessitate high-fidelity finite element meshes to solve the governing PDEs. To address this, we leverage parametric modular structure to learn component-level surrogates, enabling cheaper high-fidelity simulation. We use a neural network to model the stored potential energy in a component given boundary conditions. This yields a structured prediction task: macroscopic behavior is determined by the minimizer of the system's total potential energy, which can be approximated by composing these surrogate models. Composable energy surrogates thus permit simulation in the reduced basis of component boundaries. Costly ground-truth simulation of the full structure is avoided, as training data are generated by performing finite element analysis with individual components. Using dataset aggregation to choose training boundary conditions allows us to learn energy surrogates which produce accurate macroscopic behavior when composed, accelerating simulation of parametric meta-materials. △ Less

Submitted 15 May, 2020; v1 submitted 13 May, 2020; originally announced May 2020.

arXiv:2004.00353 [pdf, other]

SUMO: Unbiased Estimation of Log Marginal Probability for Latent Variable Models

Authors: Yucen Luo, Alex Beatson, Mohammad Norouzi, Jun Zhu, David Duvenaud, Ryan P. Adams, Ricky T. Q. Chen

Abstract: Standard variational lower bounds used to train latent variable models produce biased estimates of most quantities of interest. We introduce an unbiased estimator of the log marginal likelihood and its gradients for latent variable models based on randomized truncation of infinite series. If parameterized by an encoder-decoder architecture, the parameters of the encoder can be optimized to minimiz… ▽ More Standard variational lower bounds used to train latent variable models produce biased estimates of most quantities of interest. We introduce an unbiased estimator of the log marginal likelihood and its gradients for latent variable models based on randomized truncation of infinite series. If parameterized by an encoder-decoder architecture, the parameters of the encoder can be optimized to minimize its variance of this estimator. We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost. This estimator also allows use of latent variable models for tasks where unbiased estimators, rather than marginal likelihood lower bounds, are preferred, such as minimizing reverse KL divergences and estimating score functions. △ Less

Submitted 10 July, 2020; v1 submitted 1 April, 2020; originally announced April 2020.

Comments: ICLR 2020

arXiv:1910.08475 [pdf, other]

On Warm-Starting Neural Network Training

Authors: Jordan T. Ash, Ryan P. Adams

Abstract: In many real-world deployments of machine learning systems, data arrive piecemeal. These learning scenarios may be passive, where data arrive incrementally due to structural properties of the problem (e.g., daily financial data) or active, where samples are selected according to a measure of their quality (e.g., experimental design). In both of these cases, we are building a sequence of models tha… ▽ More In many real-world deployments of machine learning systems, data arrive piecemeal. These learning scenarios may be passive, where data arrive incrementally due to structural properties of the problem (e.g., daily financial data) or active, where samples are selected according to a measure of their quality (e.g., experimental design). In both of these cases, we are building a sequence of models that incorporate an increasing amount of data. We would like each of these models in the sequence to be performant and take advantage of all the data that are available to that point. Conventional intuition suggests that when solving a sequence of related optimization problems of this form, it should be possible to initialize using the solution of the previous iterate -- to "warm start" the optimization rather than initialize from scratch -- and see reductions in wall-clock time. However, in practice this warm-starting seems to yield poorer generalization performance than models that have fresh random initializations, even though the final training losses are similar. While it appears that some hyperparameter settings allow a practitioner to close this generalization gap, they seem to only do so in regimes that damage the wall-clock gains of the warm start. Nevertheless, it is highly desirable to be able to warm-start neural network training, as it would dramatically reduce the resource usage associated with the construction of performant deep learning systems. In this work, we take a closer look at this empirical phenomenon and try to understand when and how it occurs. We also provide a surprisingly simple trick that overcomes this pathology in several important situations, and present experiments that elucidate some of its properties. △ Less

Submitted 31 December, 2020; v1 submitted 18 October, 2019; originally announced October 2019.

Journal ref: 2020 Advances in Neural Information Processing Systems

arXiv:1907.08268 [pdf, other]

Discrete Object Generation with Reversible Inductive Construction

Authors: Ari Seff, Wenda Zhou, Farhan Damani, Abigail Doyle, Ryan P. Adams

Abstract: The success of generative modeling in continuous domains has led to a surge of interest in generating discrete data such as molecules, source code, and graphs. However, construction histories for these discrete objects are typically not unique and so generative models must reason about intractably large spaces in order to learn. Additionally, structured discrete domains are often characterized by… ▽ More The success of generative modeling in continuous domains has led to a surge of interest in generating discrete data such as molecules, source code, and graphs. However, construction histories for these discrete objects are typically not unique and so generative models must reason about intractably large spaces in order to learn. Additionally, structured discrete domains are often characterized by strict constraints on what constitutes a valid object and generative models must respect these requirements in order to produce useful novel samples. Here, we present a generative model for discrete objects employing a Markov chain where transitions are restricted to a set of local operations that preserve validity. Building off of generative interpretations of denoising autoencoders, the Markov chain alternates between producing 1) a sequence of corrupted objects that are valid but not from the data distribution, and 2) a learned reconstruction distribution that attempts to fix the corruptions while also preserving validity. This approach constrains the generative model to only produce valid objects, requires the learner to only discover local modifications to the objects, and avoids marginalization over an unknown and potentially large space of construction histories. We evaluate the proposed approach on two highly structured discrete domains, molecules and Laman graphs, and find that it compares favorably to alternative methods at capturing distributional statistics for a host of semantically relevant metrics. △ Less

Submitted 31 October, 2019; v1 submitted 18 July, 2019; originally announced July 2019.

arXiv:1906.10228 [pdf, ps, other]

A Theoretical Connection Between Statistical Physics and Reinforcement Learning

Authors: Jad Rahme, Ryan P. Adams

Abstract: Sequential decision making in the presence of uncertainty and stochastic dynamics gives rise to distributions over state/action trajectories in reinforcement learning (RL) and optimal control problems. This observation has led to a variety of connections between RL and inference in probabilistic graphical models (PGMs). Here we explore a different dimension to this relationship, examining reinforc… ▽ More Sequential decision making in the presence of uncertainty and stochastic dynamics gives rise to distributions over state/action trajectories in reinforcement learning (RL) and optimal control problems. This observation has led to a variety of connections between RL and inference in probabilistic graphical models (PGMs). Here we explore a different dimension to this relationship, examining reinforcement learning using the tools and abstractions of statistical physics. The central object in the statistical physics abstraction is the idea of a partition function $\mathcal{Z}$, and here we construct a partition function from the ensemble of possible trajectories that an agent might take in a Markov decision process. Although value functions and $Q$-functions can be derived from this partition function and interpreted via average energies, the $\mathcal{Z}$-function provides an object with its own Bellman equation that can form the basis of alternative dynamic programming approaches. Moreover, when the MDP dynamics are deterministic, the Bellman equation for $\mathcal{Z}$ is linear, allowing direct solutions that are unavailable for the nonlinear equations associated with traditional value functions. The policies learned via these $\mathcal{Z}$-based Bellman updates are tightly linked to Boltzmann-like policy parameterizations. In addition to sampling actions proportionally to the exponential of the expected cumulative reward as Boltzmann policies would, these policies take entropy into account favoring states from which many outcomes are possible. △ Less

Submitted 29 September, 2021; v1 submitted 24 June, 2019; originally announced June 2019.

arXiv:1905.12107 [pdf, ps, other]

SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers

Authors: Igor Fedorov, Ryan P. Adams, Matthew Mattina, Paul N. Whatmough

Abstract: The vast majority of processors in the world are actually microcontroller units (MCUs), which find widespread use performing simple control tasks in applications ranging from automobiles to medical devices and office equipment. The Internet of Things (IoT) promises to inject machine learning into many of these every-day objects via tiny, cheap MCUs. However, these resource-impoverished hardware pl… ▽ More The vast majority of processors in the world are actually microcontroller units (MCUs), which find widespread use performing simple control tasks in applications ranging from automobiles to medical devices and office equipment. The Internet of Things (IoT) promises to inject machine learning into many of these every-day objects via tiny, cheap MCUs. However, these resource-impoverished hardware platforms severely limit the complexity of machine learning models that can be deployed. For example, although convolutional neural networks (CNNs) achieve state-of-the-art results on many visual recognition tasks, CNN inference on MCUs is challenging due to severe finite memory limitations. To circumvent the memory challenge associated with CNNs, various alternatives have been proposed that do fit within the memory budget of an MCU, albeit at the cost of prediction accuracy. This paper challenges the idea that CNNs are not suitable for deployment on MCUs. We demonstrate that it is possible to automatically design CNNs which generalize well, while also being small enough to fit onto memory-limited MCUs. Our Sparse Architecture Search method combines neural architecture search with pruning in a single, unified approach, which learns superior models on four popular IoT datasets. The CNNs we find are more accurate and up to $4.35\times$ smaller than previous approaches, while meeting the strict MCU working memory constraint. △ Less

Submitted 28 May, 2019; originally announced May 2019.

arXiv:1905.07006 [pdf, other]

Efficient Optimization of Loops and Limits with Randomized Telescoping Sums

Authors: Alex Beatson, Ryan P. Adams

Abstract: We consider optimization problems in which the objective requires an inner loop with many steps or is the limit of a sequence of increasingly costly approximations. Meta-learning, training recurrent neural networks, and optimization of the solutions to differential equations are all examples of optimization problems with this character. In such problems, it can be expensive to compute the objectiv… ▽ More We consider optimization problems in which the objective requires an inner loop with many steps or is the limit of a sequence of increasingly costly approximations. Meta-learning, training recurrent neural networks, and optimization of the solutions to differential equations are all examples of optimization problems with this character. In such problems, it can be expensive to compute the objective function value and its gradient, but truncating the loop or using less accurate approximations can induce biases that damage the overall solution. We propose randomized telescope (RT) gradient estimators, which represent the objective as the sum of a telescoping series and sample linear combinations of terms to provide cheap unbiased gradient estimates. We identify conditions under which RT estimators achieve optimization convergence rates independent of the length of the loop or the required accuracy of the approximation. We also derive a method for tuning RT estimators online to maximize a lower bound on the expected decrease in loss per unit of computation. We evaluate our adaptive RT estimators on a range of applications including meta-optimization of learning rates, variational inference of ODE parameters, and training an LSTM to model long sequences. △ Less

Submitted 16 May, 2019; originally announced May 2019.

arXiv:1811.08545 [pdf, other]

doi 10.1021/acscentsci.9b00085

Rapid Prediction of Electron-Ionization Mass Spectrometry using Neural Networks

Authors: Jennifer N. Wei, David Belanger, Ryan P. Adams, D. Sculley

Abstract: When confronted with a substance of unknown identity, researchers often perform mass spectrometry on the sample and compare the observed spectrum to a library of previously-collected spectra to identify the molecule. While popular, this approach will fail to identify molecules that are not in the existing library. In response, we propose to improve the library's coverage by augmenting it with synt… ▽ More When confronted with a substance of unknown identity, researchers often perform mass spectrometry on the sample and compare the observed spectrum to a library of previously-collected spectra to identify the molecule. While popular, this approach will fail to identify molecules that are not in the existing library. In response, we propose to improve the library's coverage by augmenting it with synthetic spectra that are predicted using machine learning. We contribute a lightweight neural network model that quickly predicts mass spectra for small molecules. Achieving high accuracy predictions requires a novel neural network architecture that is designed to capture typical fragmentation patterns from electron ionization. We analyze the effects of our modeling innovations on library matching performance and compare our models to prior machine learning-based work on spectrum prediction. △ Less

Submitted 17 March, 2019; v1 submitted 20 November, 2018; originally announced November 2018.

Comments: 12 pages, 5 figures

Journal ref: ACS Cent. Sci. 2019 5 (4) 700-708

arXiv:1807.06732 [pdf, other]

Motivating the Rules of the Game for Adversarial Example Research

Authors: Justin Gilmer, Ryan P. Adams, Ian Goodfellow, David Andersen, George E. Dahl

Abstract: Advances in machine learning have led to broad deployment of systems with impressive performance on important problems. Nonetheless, these systems can be induced to make errors on data that are surprisingly similar to examples the learned system handles correctly. The existence of these errors raises a variety of questions about out-of-sample generalization and whether bad actors might use such ex… ▽ More Advances in machine learning have led to broad deployment of systems with impressive performance on important problems. Nonetheless, these systems can be induced to make errors on data that are surprisingly similar to examples the learned system handles correctly. The existence of these errors raises a variety of questions about out-of-sample generalization and whether bad actors might use such examples to abuse deployed systems. As a result of these security concerns, there has been a flurry of recent papers proposing algorithms to defend against such malicious perturbations of correctly handled examples. It is unclear how such misclassifications represent a different kind of security problem than other errors, or even other attacker-produced examples that have no specific relationship to an uncorrupted input. In this paper, we argue that adversarial example defense papers have, to date, mostly considered abstract, toy games that do not relate to any specific security concern. Furthermore, defense papers have not yet precisely described all the abilities and limitations of attackers that would be relevant in practical security. Towards this end, we establish a taxonomy of motivations, constraints, and abilities for more plausible adversaries. Finally, we provide a series of recommendations outlining a path forward for future work to more clearly articulate the threat model and perform more meaningful evaluation. △ Less

Submitted 19 July, 2018; v1 submitted 17 July, 2018; originally announced July 2018.

arXiv:1804.05862 [pdf, other]

Non-Vacuous Generalization Bounds at the ImageNet Scale: A PAC-Bayesian Compression Approach

Authors: Wenda Zhou, Victor Veitch, Morgane Austern, Ryan P. Adams, Peter Orbanz

Abstract: Modern neural networks are highly overparameterized, with capacity to substantially overfit to training data. Nevertheless, these networks often generalize well in practice. It has also been observed that trained networks can often be "compressed" to much smaller representations. The purpose of this paper is to connect these two empirical observations. Our main technical result is a generalization… ▽ More Modern neural networks are highly overparameterized, with capacity to substantially overfit to training data. Nevertheless, these networks often generalize well in practice. It has also been observed that trained networks can often be "compressed" to much smaller representations. The purpose of this paper is to connect these two empirical observations. Our main technical result is a generalization bound for compressed networks based on the compressed size. Combined with off-the-shelf compression algorithms, the bound leads to state of the art generalization guarantees; in particular, we provide the first non-vacuous generalization guarantees for realistic architectures applied to the ImageNet classification problem. As additional evidence connecting compression and generalization, we show that compressibility of models that tend to overfit is limited: We establish an absolute limit on expected compressibility as a function of expected generalization error, where the expectations are over the random choice of training examples. The bounds are complemented by empirical results that show an increase in overfitting implies an increase in the number of bits required to describe a trained network. △ Less

Submitted 24 February, 2019; v1 submitted 16 April, 2018; originally announced April 2018.

Comments: 16 pages, 1 figure. Accepted at ICLR 2019

arXiv:1803.00113 [pdf, other]

Approximate Inference for Constructing Astronomical Catalogs from Images

Authors: Jeffrey Regier, Andrew C. Miller, David Schlegel, Ryan P. Adams, Jon D. McAuliffe, Prabhat

Abstract: We present a new, fully generative model for constructing astronomical catalogs from optical telescope image sets. Each pixel intensity is treated as a random variable with parameters that depend on the latent properties of stars and galaxies. These latent properties are themselves modeled as random. We compare two procedures for posterior inference. One procedure is based on Markov chain Monte Ca… ▽ More We present a new, fully generative model for constructing astronomical catalogs from optical telescope image sets. Each pixel intensity is treated as a random variable with parameters that depend on the latent properties of stars and galaxies. These latent properties are themselves modeled as random. We compare two procedures for posterior inference. One procedure is based on Markov chain Monte Carlo (MCMC) while the other is based on variational inference (VI). The MCMC procedure excels at quantifying uncertainty, while the VI procedure is 1000 times faster. On a supercomputer, the VI procedure efficiently uses 665,000 CPU cores to construct an astronomical catalog from 50 terabytes of images in 14.6 minutes, demonstrating the scaling characteristics necessary to construct catalogs for upcoming astronomical surveys. △ Less

Submitted 9 April, 2019; v1 submitted 28 February, 2018; originally announced March 2018.

Comments: accepted to the Annals of Applied Statistics

MSC Class: 62P35 ACM Class: G.3

arXiv:1802.03451 [pdf, other]

Estimating the Spectral Density of Large Implicit Matrices

Authors: Ryan P. Adams, Jeffrey Pennington, Matthew J. Johnson, Jamie Smith, Yaniv Ovadia, Brian Patton, James Saunderson

Abstract: Many important problems are characterized by the eigenvalues of a large matrix. For example, the difficulty of many optimization problems, such as those arising from the fitting of large models in statistics and machine learning, can be investigated via the spectrum of the Hessian of the empirical loss function. Network data can be understood via the eigenstructure of a graph Laplacian matrix usin… ▽ More Many important problems are characterized by the eigenvalues of a large matrix. For example, the difficulty of many optimization problems, such as those arising from the fitting of large models in statistics and machine learning, can be investigated via the spectrum of the Hessian of the empirical loss function. Network data can be understood via the eigenstructure of a graph Laplacian matrix using spectral graph theory. Quantum simulations and other many-body problems are often characterized via the eigenvalues of the solution space, as are various dynamic systems. However, naive eigenvalue estimation is computationally expensive even when the matrix can be represented; in many of these situations the matrix is so large as to only be available implicitly via products with vectors. Even worse, one may only have noisy estimates of such matrix vector products. In this work, we combine several different techniques for randomized estimation and show that it is possible to construct unbiased estimators to answer a broad class of questions about the spectra of such implicit matrices, even in the presence of noise. We validate these methods on large-scale problems in which graph theory and random matrix theory provide ground truth. △ Less

Submitted 9 February, 2018; originally announced February 2018.

arXiv:1709.09216 [pdf, other]

PASS-GLM: polynomial approximate sufficient statistics for scalable Bayesian GLM inference

Authors: Jonathan H. Huggins, Ryan P. Adams, Tamara Broderick

Abstract: Generalized linear models (GLMs) -- such as logistic regression, Poisson regression, and robust regression -- provide interpretable models for diverse data types. Probabilistic approaches, particularly Bayesian ones, allow coherent estimates of uncertainty, incorporation of prior information, and sharing of power across experiments via hierarchical models. In practice, however, the approximate Bay… ▽ More Generalized linear models (GLMs) -- such as logistic regression, Poisson regression, and robust regression -- provide interpretable models for diverse data types. Probabilistic approaches, particularly Bayesian ones, allow coherent estimates of uncertainty, incorporation of prior information, and sharing of power across experiments via hierarchical models. In practice, however, the approximate Bayesian methods necessary for inference have either failed to scale to large data sets or failed to provide theoretical guarantees on the quality of inference. We propose a new approach based on constructing polynomial approximate sufficient statistics for GLMs (PASS-GLM). We demonstrate that our method admits a simple algorithm as well as trivial streaming and distributed extensions that do not compound error across computations. We provide theoretical guarantees on the quality of point (MAP) estimates, the approximate posterior, and posterior mean and uncertainty estimates. We validate our approach empirically in the case of logistic regression using a quadratic approximation and show competitive performance with stochastic gradient descent, MCMC, and the Laplace approximation in terms of speed and multiple measures of accuracy -- including on an advertising data set with 40 million data points and 20,000 covariates. △ Less

Submitted 17 December, 2018; v1 submitted 26 September, 2017; originally announced September 2017.

Comments: In Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS 2017). v3: corrected typos in Appendix A

arXiv:1705.07880 [pdf, other]

Reducing Reparameterization Gradient Variance

Authors: Andrew C. Miller, Nicholas J. Foti, Alexander D'Amour, Ryan P. Adams

Abstract: Optimization with noisy gradients has become ubiquitous in statistics and machine learning. Reparameterization gradients, or gradient estimates computed via the "reparameterization trick," represent a class of noisy gradients often used in Monte Carlo variational inference (MCVI). However, when these gradient estimators are too noisy, the optimization procedure can be slow or fail to converge. One… ▽ More Optimization with noisy gradients has become ubiquitous in statistics and machine learning. Reparameterization gradients, or gradient estimates computed via the "reparameterization trick," represent a class of noisy gradients often used in Monte Carlo variational inference (MCVI). However, when these gradient estimators are too noisy, the optimization procedure can be slow or fail to converge. One way to reduce noise is to use more samples for the gradient estimate, but this can be computationally expensive. Instead, we view the noisy gradient as a random variable, and form an inexpensive approximation of the generating procedure for the gradient sample. This approximation has high correlation with the noisy gradient by construction, making it a useful control variate for variance reduction. We demonstrate our approach on non-conjugate multi-level hierarchical models and a Bayesian neural net where we observed gradient variance reductions of multiple orders of magnitude (20-2,000x). △ Less

Submitted 22 May, 2017; originally announced May 2017.

arXiv:1704.04997 [pdf, other]

Multimodal Prediction and Personalization of Photo Edits with Deep Generative Models

Authors: Ardavan Saeedi, Matthew D. Hoffman, Stephen J. DiVerdi, Asma Ghandeharioun, Matthew J. Johnson, Ryan P. Adams

Abstract: Professional-grade software applications are powerful but complicated$-$expert users can achieve impressive results, but novices often struggle to complete even basic tasks. Photo editing is a prime example: after loading a photo, the user is confronted with an array of cryptic sliders like "clarity", "temp", and "highlights". An automatically generated suggestion could help, but there is no singl… ▽ More Professional-grade software applications are powerful but complicated$-$expert users can achieve impressive results, but novices often struggle to complete even basic tasks. Photo editing is a prime example: after loading a photo, the user is confronted with an array of cryptic sliders like "clarity", "temp", and "highlights". An automatically generated suggestion could help, but there is no single "correct" edit for a given image$-$different experts may make very different aesthetic decisions when faced with the same image, and a single expert may make different choices depending on the intended use of the image (or on a whim). We therefore want a system that can propose multiple diverse, high-quality edits while also learning from and adapting to a user's aesthetic preferences. In this work, we develop a statistical model that meets these objectives. Our model builds on recent advances in neural network generative modeling and scalable inference, and uses hierarchical structure to learn editing patterns across many diverse users. Empirically, we find that our model outperforms other approaches on this challenging multimodal prediction task. △ Less

Submitted 17 April, 2017; originally announced April 2017.

arXiv:1611.06585 [pdf, other]

Variational Boosting: Iteratively Refining Posterior Approximations

Authors: Andrew C. Miller, Nicholas Foti, Ryan P. Adams

Abstract: We propose a black-box variational inference method to approximate intractable distributions with an increasingly rich approximating class. Our method, termed variational boosting, iteratively refines an existing variational approximation by solving a sequence of optimization problems, allowing the practitioner to trade computation time for accuracy. We show how to expand the variational approxima… ▽ More We propose a black-box variational inference method to approximate intractable distributions with an increasingly rich approximating class. Our method, termed variational boosting, iteratively refines an existing variational approximation by solving a sequence of optimization problems, allowing the practitioner to trade computation time for accuracy. We show how to expand the variational approximating class by incorporating additional covariance structure and by introducing new components to form a mixture. We apply variational boosting to synthetic and real statistical models, and show that resulting posterior inferences compare favorably to existing posterior approximation algorithms in both accuracy and efficiency. △ Less

Submitted 19 February, 2017; v1 submitted 20 November, 2016; originally announced November 2016.

Comments: 25 pages, 9 figures, 2 tables

arXiv:1610.08466 [pdf, other]

Recurrent switching linear dynamical systems

Authors: Scott W. Linderman, Andrew C. Miller, Ryan P. Adams, David M. Blei, Liam Paninski, Matthew J. Johnson

Abstract: Many natural systems, such as neurons firing in the brain or basketball teams traversing a court, give rise to time series data with complex, nonlinear dynamics. We can gain insight into these systems by decomposing the data into segments that are each explained by simpler dynamic units. Building on switching linear dynamical systems (SLDS), we present a new model class that not only discovers the… ▽ More Many natural systems, such as neurons firing in the brain or basketball teams traversing a court, give rise to time series data with complex, nonlinear dynamics. We can gain insight into these systems by decomposing the data into segments that are each explained by simpler dynamic units. Building on switching linear dynamical systems (SLDS), we present a new model class that not only discovers these dynamical units, but also explains how their switching behavior depends on observations or continuous latent states. These "recurrent" switching linear dynamical systems provide further insight by discovering the conditions under which each unit is deployed, something that traditional SLDS models fail to do. We leverage recent algorithmic advances in approximate inference to make Bayesian inference in these models easy, fast, and scalable. △ Less

Submitted 26 October, 2016; originally announced October 2016.

Comments: 15 pages, 6 figures

arXiv:1610.08465 [pdf, other]

Bayesian latent structure discovery from multi-neuron recordings

Authors: Scott W. Linderman, Ryan P. Adams, Jonathan W. Pillow

Abstract: Neural circuits contain heterogeneous groups of neurons that differ in type, location, connectivity, and basic response properties. However, traditional methods for dimensionality reduction and clustering are ill-suited to recovering the structure underlying the organization of neural circuits. In particular, they do not take advantage of the rich temporal dependencies in multi-neuron recordings a… ▽ More Neural circuits contain heterogeneous groups of neurons that differ in type, location, connectivity, and basic response properties. However, traditional methods for dimensionality reduction and clustering are ill-suited to recovering the structure underlying the organization of neural circuits. In particular, they do not take advantage of the rich temporal dependencies in multi-neuron recordings and fail to account for the noise in neural spike trains. Here we describe new tools for inferring latent structure from simultaneously recorded spike train data using a hierarchical extension of a multi-neuron point process model commonly known as the generalized linear model (GLM). Our approach combines the GLM with flexible graph-theoretic priors governing the relationship between latent features and neural connectivity patterns. Fully Bayesian inference via Pólya-gamma augmentation of the resulting model allows us to classify neurons and infer latent dimensions of circuit organization from correlated spike trains. We demonstrate the effectiveness of our method with applications to synthetic data and multi-neuron recordings in primate retina, revealing latent patterns of neural types and locations from spike trains alone. △ Less

Submitted 26 October, 2016; originally announced October 2016.

Comments: 11 pages, 5 figures, to appear in Advances in Neural Information Processing Systems 2016

arXiv:1610.02415 [pdf, other]

doi 10.1021/acscentsci.7b00572

Automatic chemical design using a data-driven continuous representation of molecules

Authors: Rafael Gómez-Bombarelli, Jennifer N. Wei, David Duvenaud, José Miguel Hernández-Lobato, Benjamín Sánchez-Lengeling, Dennis Sheberla, Jorge Aguilera-Iparraguirre, Timothy D. Hirzel, Ryan P. Adams, Alán Aspuru-Guzik

Abstract: We report a method to convert discrete representations of molecules to and from a multidimensional continuous representation. This model allows us to generate new molecules for efficient exploration and optimization through open-ended spaces of chemical compounds. A deep neural network was trained on hundreds of thousands of existing chemical structures to construct three coupled functions: an enc… ▽ More We report a method to convert discrete representations of molecules to and from a multidimensional continuous representation. This model allows us to generate new molecules for efficient exploration and optimization through open-ended spaces of chemical compounds. A deep neural network was trained on hundreds of thousands of existing chemical structures to construct three coupled functions: an encoder, a decoder and a predictor. The encoder converts the discrete representation of a molecule into a real-valued continuous vector, and the decoder converts these continuous vectors back to discrete molecular representations. The predictor estimates chemical properties from the latent continuous vector representation of the molecule. Continuous representations allow us to automatically generate novel chemical structures by performing simple operations in the latent space, such as decoding random vectors, perturbing known chemical structures, or interpolating between molecules. Continuous representations also allow the use of powerful gradient-based optimization to efficiently guide the search for optimized functional compounds. We demonstrate our method in the domain of drug-like molecules and also in the set of molecules with fewer that nine heavy atoms. △ Less

Submitted 5 December, 2017; v1 submitted 7 October, 2016; originally announced October 2016.

Comments: 26 pages, 8 figures

arXiv:1606.05896 [pdf, other]

Clustering with a Reject Option: Interactive Clustering as Bayesian Prior Elicitation

Authors: Akash Srivastava, James Zou, Ryan P. Adams, Charles Sutton

Abstract: A good clustering can help a data analyst to explore and understand a data set, but what constitutes a good clustering may depend on domain-specific and application-specific criteria. These criteria can be difficult to formalize, even when it is easy for an analyst to know a good clustering when they see one. We present a new approach to interactive clustering for data exploration called TINDER, b… ▽ More A good clustering can help a data analyst to explore and understand a data set, but what constitutes a good clustering may depend on domain-specific and application-specific criteria. These criteria can be difficult to formalize, even when it is easy for an analyst to know a good clustering when they see one. We present a new approach to interactive clustering for data exploration called TINDER, based on a particularly simple feedback mechanism, in which an analyst can reject a given clustering and request a new one, which is chosen to be different from the previous clustering while fitting the data well. We formalize this interaction in a Bayesian framework as a method for prior elicitation, in which each different clustering is produced by a prior distribution that is modified to discourage previously rejected clusterings. We show that TINDER successfully produces a diverse set of clusterings, each of equivalent quality, that are much more diverse than would be obtained by randomized restarts. △ Less

Submitted 19 June, 2016; originally announced June 2016.

Comments: presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY

arXiv:1603.06277 [pdf, other]

Composing graphical models with neural networks for structured representations and fast inference

Authors: Matthew J. Johnson, David Duvenaud, Alexander B. Wiltschko, Sandeep R. Datta, Ryan P. Adams

Abstract: We propose a general modeling and inference framework that composes probabilistic graphical models with deep learning methods and combines their respective strengths. Our model family augments graphical structure in latent variables with neural network observation models. For inference, we extend variational autoencoders to use graphical model approximating distributions with recognition networks… ▽ More We propose a general modeling and inference framework that composes probabilistic graphical models with deep learning methods and combines their respective strengths. Our model family augments graphical structure in latent variables with neural network observation models. For inference, we extend variational autoencoders to use graphical model approximating distributions with recognition networks that output conjugate potentials. All components of these models are learned simultaneously with a single objective, giving a scalable algorithm that leverages stochastic variational inference, natural gradients, graphical model message passing, and the reparameterization trick. We illustrate this framework with several example models and an application to mouse behavioral phenotyping. △ Less

Submitted 7 July, 2017; v1 submitted 20 March, 2016; originally announced March 2016.

Comments: v5 fixes tex compilation bugs and also a math bug in the statement and proof of Prop. 4.1 (and D.3). v4 adds two paragraphs to the related work section and fixes typos in the appendices. v3 fixes some typos in the appendices. v2 is a rewrite from v1 to be more readable and to include detailed appendices

arXiv:1602.05221 [pdf, other]

Patterns of Scalable Bayesian Inference

Authors: Elaine Angelino, Matthew James Johnson, Ryan P. Adams

Abstract: Datasets are growing not just in size but in complexity, creating a demand for rich models and quantification of uncertainty. Bayesian methods are an excellent fit for this demand, but scaling Bayesian inference is a challenge. In response to this challenge, there has been considerable recent work based on varying assumptions about model structure, underlying computational resources, and the impor… ▽ More Datasets are growing not just in size but in complexity, creating a demand for rich models and quantification of uncertainty. Bayesian methods are an excellent fit for this demand, but scaling Bayesian inference is a challenge. In response to this challenge, there has been considerable recent work based on varying assumptions about model structure, underlying computational resources, and the importance of asymptotic correctness. As a result, there is a zoo of ideas with few clear overarching principles. In this paper, we seek to identify unifying principles, patterns, and intuitions for scaling Bayesian inference. We review existing work on utilizing modern computing resources with both MCMC and variational approximation techniques. From this taxonomy of ideas, we characterize the general principles that have proven successful for designing scalable inference procedures and comment on the path forward. △ Less

Submitted 22 March, 2016; v1 submitted 16 February, 2016; originally announced February 2016.

arXiv:1511.09422 [pdf, other]

A General Framework for Constrained Bayesian Optimization using Information-based Search

Authors: José Miguel Hernández-Lobato, Michael A. Gelbart, Ryan P. Adams, Matthew W. Hoffman, Zoubin Ghahramani

Abstract: We present an information-theoretic framework for solving global black-box optimization problems that also have black-box constraints. Of particular interest to us is to efficiently solve problems with decoupled constraints, in which subsets of the objective and constraint functions may be evaluated independently. For example, when the objective is evaluated on a CPU and the constraints are evalua… ▽ More We present an information-theoretic framework for solving global black-box optimization problems that also have black-box constraints. Of particular interest to us is to efficiently solve problems with decoupled constraints, in which subsets of the objective and constraint functions may be evaluated independently. For example, when the objective is evaluated on a CPU and the constraints are evaluated independently on a GPU. These problems require an acquisition function that can be separated into the contributions of the individual function evaluations. We develop one such acquisition function and call it Predictive Entropy Search with Constraints (PESC). PESC is an approximation to the expected information gain criterion and it compares favorably to alternative approaches based on improvement in several synthetic and real-world problems. In addition to this, we consider problems with a mix of functions that are fast and slow to evaluate. These problems require balancing the amount of time spent in the meta-computation of PESC and in the actual evaluation of the target objective. We take a bounded rationality approach and develop partial update for PESC which trades off accuracy against speed. We then propose a method for adaptively switching between the partial and full updates for PESC. This allows us to interpolate between versions of PESC that are efficient in terms of function evaluations and those that are efficient in terms of wall-clock time. Overall, we demonstrate that PESC is an effective algorithm that provides a promising direction towards a unified solution for constrained Bayesian optimization. △ Less

Submitted 4 September, 2016; v1 submitted 30 November, 2015; originally announced November 2015.

arXiv:1511.05467 [pdf, other]

Predictive Entropy Search for Multi-objective Bayesian Optimization

Authors: Daniel Hernández-Lobato, José Miguel Hernández-Lobato, Amar Shah, Ryan P. Adams

Abstract: We present PESMO, a Bayesian method for identifying the Pareto set of multi-objective optimization problems, when the functions are expensive to evaluate. The central idea of PESMO is to choose evaluation points so as to maximally reduce the entropy of the posterior distribution over the Pareto set. Critically, the PESMO multi-objective acquisition function can be decomposed as a sum of objective-… ▽ More We present PESMO, a Bayesian method for identifying the Pareto set of multi-objective optimization problems, when the functions are expensive to evaluate. The central idea of PESMO is to choose evaluation points so as to maximally reduce the entropy of the posterior distribution over the Pareto set. Critically, the PESMO multi-objective acquisition function can be decomposed as a sum of objective-specific acquisition functions, which enables the algorithm to be used in \emph{decoupled} scenarios in which the objectives can be evaluated separately and perhaps with different costs. This decoupling capability also makes it possible to identify difficult objectives that require more evaluations. PESMO also offers gains in efficiency, as its cost scales linearly with the number of objectives, in comparison to the exponential cost of other methods. We compare PESMO with other related methods for multi-objective Bayesian optimization on synthetic and real-world problems. The results show that PESMO produces better recommendations with a smaller number of evaluations of the objectives, and that a decoupled evaluation can lead to improvements in performance, particularly when the number of objectives is large. △ Less

Submitted 21 February, 2016; v1 submitted 17 November, 2015; originally announced November 2015.

arXiv:1511.02543 [pdf, other]

Sandwiching the marginal likelihood using bidirectional Monte Carlo

Authors: Roger B. Grosse, Zoubin Ghahramani, Ryan P. Adams

Abstract: Computing the marginal likelihood (ML) of a model requires marginalizing out all of the parameters and latent variables, a difficult high-dimensional summation or integration problem. To make matters worse, it is often hard to measure the accuracy of one's ML estimates. We present bidirectional Monte Carlo, a technique for obtaining accurate log-ML estimates on data simulated from a model. This me… ▽ More Computing the marginal likelihood (ML) of a model requires marginalizing out all of the parameters and latent variables, a difficult high-dimensional summation or integration problem. To make matters worse, it is often hard to measure the accuracy of one's ML estimates. We present bidirectional Monte Carlo, a technique for obtaining accurate log-ML estimates on data simulated from a model. This method obtains stochastic lower bounds on the log-ML using annealed importance sampling or sequential Monte Carlo, and obtains stochastic upper bounds by running these same algorithms in reverse starting from an exact posterior sample. The true value can be sandwiched between these two stochastic bounds with high probability. Using the ground truth log-ML estimates obtained from our method, we quantitatively evaluate a wide variety of existing ML estimators on several latent variable models: clustering, a low rank approximation, and a binary attributes model. These experiments yield insights into how to accurately estimate marginal likelihoods. △ Less

Submitted 8 November, 2015; originally announced November 2015.

arXiv:1509.09292 [pdf, other]

Convolutional Networks on Graphs for Learning Molecular Fingerprints

Authors: David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre, Rafael Gómez-Bombarelli, Timothy Hirzel, Alán Aspuru-Guzik, Ryan P. Adams

Abstract: We introduce a convolutional neural network that operates directly on graphs. These networks allow end-to-end learning of prediction pipelines whose inputs are graphs of arbitrary size and shape. The architecture we present generalizes standard molecular feature extraction methods based on circular fingerprints. We show that these data-driven features are more interpretable, and have better predic… ▽ More We introduce a convolutional neural network that operates directly on graphs. These networks allow end-to-end learning of prediction pipelines whose inputs are graphs of arbitrary size and shape. The architecture we present generalizes standard molecular feature extraction methods based on circular fingerprints. We show that these data-driven features are more interpretable, and have better predictive performance on a variety of tasks. △ Less

Submitted 3 November, 2015; v1 submitted 30 September, 2015; originally announced September 2015.

Comments: 9 pages, 5 figures. To appear in Neural Information Processing Systems (NIPS)

arXiv:1507.03228 [pdf, other]

Scalable Bayesian Inference for Excitatory Point Process Networks

Authors: Scott W. Linderman, Ryan P. Adams

Abstract: Networks capture our intuition about relationships in the world. They describe the friendships between Facebook users, interactions in financial markets, and synapses connecting neurons in the brain. These networks are richly structured with cliques of friends, sectors of stocks, and a smorgasbord of cell types that govern how neurons connect. Some networks, like social network friendships, can be… ▽ More Networks capture our intuition about relationships in the world. They describe the friendships between Facebook users, interactions in financial markets, and synapses connecting neurons in the brain. These networks are richly structured with cliques of friends, sectors of stocks, and a smorgasbord of cell types that govern how neurons connect. Some networks, like social network friendships, can be directly observed, but in many cases we only have an indirect view of the network through the actions of its constituents and an understanding of how the network mediates that activity. In this work, we focus on the problem of latent network discovery in the case where the observable activity takes the form of a mutually-excitatory point process known as a Hawkes process. We build on previous work that has taken a Bayesian approach to this problem, specifying prior distributions over the latent network structure and a likelihood of observed activity given this network. We extend this work by proposing a discrete-time formulation and developing a computationally efficient stochastic variational inference (SVI) algorithm that allows us to scale the approach to long sequences of observations. We demonstrate our algorithm on the calcium imaging data used in the Chalearn neural connectomics challenge. △ Less

Submitted 12 July, 2015; originally announced July 2015.

arXiv:1506.05843 [pdf, other]

Dependent Multinomial Models Made Easy: Stick Breaking with the Pólya-Gamma Augmentation

Authors: Scott W. Linderman, Matthew J. Johnson, Ryan P. Adams

Abstract: Many practical modeling problems involve discrete data that are best represented as draws from multinomial or categorical distributions. For example, nucleotides in a DNA sequence, children's names in a given state and year, and text documents are all commonly modeled with multinomial distributions. In all of these cases, we expect some form of dependency between the draws: the nucleotide at one p… ▽ More Many practical modeling problems involve discrete data that are best represented as draws from multinomial or categorical distributions. For example, nucleotides in a DNA sequence, children's names in a given state and year, and text documents are all commonly modeled with multinomial distributions. In all of these cases, we expect some form of dependency between the draws: the nucleotide at one position in the DNA strand may depend on the preceding nucleotides, children's names are highly correlated from year to year, and topics in text may be correlated and dynamic. These dependencies are not naturally captured by the typical Dirichlet-multinomial formulation. Here, we leverage a logistic stick-breaking representation and recent innovations in Pólya-gamma augmentation to reformulate the multinomial distribution in terms of latent variables with jointly Gaussian likelihoods, enabling us to take advantage of a host of Bayesian inference techniques for Gaussian models with minimal overhead. △ Less

Submitted 18 June, 2015; originally announced June 2015.

arXiv:1506.03767 [pdf, other]

Spectral Representations for Convolutional Neural Networks

Authors: Oren Rippel, Jasper Snoek, Ryan P. Adams

Abstract: Discrete Fourier transforms provide a significant speedup in the computation of convolutions in deep learning. In this work, we demonstrate that, beyond its advantages for efficient computation, the spectral domain also provides a powerful representation in which to model and train convolutional neural networks (CNNs). We employ spectral representations to introduce a number of innovations to CN… ▽ More Discrete Fourier transforms provide a significant speedup in the computation of convolutions in deep learning. In this work, we demonstrate that, beyond its advantages for efficient computation, the spectral domain also provides a powerful representation in which to model and train convolutional neural networks (CNNs). We employ spectral representations to introduce a number of innovations to CNN design. First, we propose spectral pooling, which performs dimensionality reduction by truncating the representation in the frequency domain. This approach preserves considerably more information per parameter than other pooling strategies and enables flexibility in the choice of pooling output dimensionality. This representation also enables a new form of stochastic regularization by randomized modification of resolution. We show that these methods achieve competitive results on classification and approximation tasks, without using any dropout or max-pooling. Finally, we demonstrate the effectiveness of complex-coefficient spectral parameterization of convolutional filters. While this leaves the underlying model unchanged, it results in a representation that greatly facilitates optimization. We observe on a variety of popular CNN configurations that this leads to significantly faster convergence during training. △ Less

Submitted 11 June, 2015; originally announced June 2015.

arXiv:1504.01344 [pdf, other]

Early Stopping is Nonparametric Variational Inference

Authors: Dougal Maclaurin, David Duvenaud, Ryan P. Adams

Abstract: We show that unconverged stochastic gradient descent can be interpreted as a procedure that samples from a nonparametric variational approximate posterior distribution. This distribution is implicitly defined as the transformation of an initial distribution by a sequence of optimization updates. By tracking the change in entropy over this sequence of transformations during optimization, we form a… ▽ More We show that unconverged stochastic gradient descent can be interpreted as a procedure that samples from a nonparametric variational approximate posterior distribution. This distribution is implicitly defined as the transformation of an initial distribution by a sequence of optimization updates. By tracking the change in entropy over this sequence of transformations during optimization, we form a scalable, unbiased estimate of the variational lower bound on the log marginal likelihood. We can use this bound to optimize hyperparameters instead of using cross-validation. This Bayesian interpretation of SGD suggests improved, overfitting-resistant optimization procedures, and gives a theoretical foundation for popular tricks such as early stopping and ensembling. We investigate the properties of this marginal likelihood estimator on neural network models. △ Less

Submitted 6 April, 2015; originally announced April 2015.

Comments: 8 pages, 5 figures

Showing 1–50 of 83 results for author: Adams, R P