-
Aligning AI-driven discovery with human intuition
Authors:
Kevin Zhang,
Hod Lipson
Abstract:
As data-driven modeling of physical dynamical systems becomes more prevalent, a new challenge is emerging: making these models more compatible and aligned with existing human knowledge. AI-driven scientific modeling processes typically begin with identifying hidden state variables, then deriving governing equations, followed by predicting and analyzing future behaviors. The critical initial step o…
▽ More
As data-driven modeling of physical dynamical systems becomes more prevalent, a new challenge is emerging: making these models more compatible and aligned with existing human knowledge. AI-driven scientific modeling processes typically begin with identifying hidden state variables, then deriving governing equations, followed by predicting and analyzing future behaviors. The critical initial step of identification of an appropriate set of state variables remains challenging for two reasons. First, finding a compact set of meaningfully predictive variables is mathematically difficult and under-defined. A second reason is that variables found often lack physical significance, and are therefore difficult for human scientists to interpret. We propose a new general principle for distilling representations that are naturally more aligned with human intuition, without relying on prior physical knowledge. We demonstrate our approach on a number of experimental and simulated system where the variables generated by the AI closely resemble those chosen independently by human scientists. We suggest that this principle can help make human-AI collaboration more fruitful, as well as shed light on how humans make scientific modeling choices.
△ Less
Submitted 9 October, 2024;
originally announced October 2024.
-
Sequencing the Neurome: Towards Scalable Exact Parameter Reconstruction of Black-Box Neural Networks
Authors:
Judah Goldfeder,
Quinten Roets,
Gabe Guo,
John Wright,
Hod Lipson
Abstract:
Inferring the exact parameters of a neural network with only query access is an NP-Hard problem, with few practical existing algorithms. Solutions would have major implications for security, verification, interpretability, and understanding biological networks. The key challenges are the massive parameter space, and complex non-linear relationships between neurons. We resolve these challenges usin…
▽ More
Inferring the exact parameters of a neural network with only query access is an NP-Hard problem, with few practical existing algorithms. Solutions would have major implications for security, verification, interpretability, and understanding biological networks. The key challenges are the massive parameter space, and complex non-linear relationships between neurons. We resolve these challenges using two insights. First, we observe that almost all networks used in practice are produced by random initialization and first order optimization, an inductive bias that drastically reduces the practical parameter space. Second, we present a novel query generation algorithm that produces maximally informative samples, letting us untangle the non-linear relationships efficiently. We demonstrate reconstruction of a hidden network containing over 1.5 million parameters, and of one 7 layers deep, the largest and deepest reconstructions to date, with max parameter difference less than 0.0001, and illustrate robustness and scalability across a variety of architectures, datasets, and training procedures.
△ Less
Submitted 27 September, 2024;
originally announced September 2024.
-
Diffusion Models Are Promising for Ab Initio Structure Solutions from Nanocrystalline Powder Diffraction Data
Authors:
Gabe Guo,
Tristan Saidi,
Maxwell Terban,
Simon JL Billinge,
Hod Lipson
Abstract:
A major challenge in materials science is the determination of the structure of nanometer sized objects. Here we present a novel approach that uses a generative machine learning model based on a Diffusion model that is trained on 45,229 known structures. The model factors both the measured diffraction pattern as well as relevant statistical priors on the unit cell of atomic cluster structures. Con…
▽ More
A major challenge in materials science is the determination of the structure of nanometer sized objects. Here we present a novel approach that uses a generative machine learning model based on a Diffusion model that is trained on 45,229 known structures. The model factors both the measured diffraction pattern as well as relevant statistical priors on the unit cell of atomic cluster structures. Conditioned only on the chemical formula and the information-scarce finite-size broadened powder diffraction pattern, we find that our model, PXRDnet, can successfully solve simulated nanocrystals as small as 10 angstroms across 200 materials of varying symmetry and complexity, including structures from all seven crystal systems. We show that our model can determine structural solutions with up to $81.5\%$ accuracy, as measured by structural correlation. Furthermore, PXRDnet is capable of solving structures from noisy diffraction patterns gathered in real-world experiments. We suggest that data driven approaches, bootstrapped from theoretical simulation, will ultimately provide a path towards determining the structure of previously unsolved nano-materials.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Reconfigurable Robot Identification from Motion Data
Authors:
Yuhang Hu,
Yunzhe Wang,
Ruibo Liu,
Zhou Shen,
Hod Lipson
Abstract:
Integrating Large Language Models (VLMs) and Vision-Language Models (VLMs) with robotic systems enables robots to process and understand complex natural language instructions and visual information. However, a fundamental challenge remains: for robots to fully capitalize on these advancements, they must have a deep understanding of their physical embodiment. The gap between AI models cognitive cap…
▽ More
Integrating Large Language Models (VLMs) and Vision-Language Models (VLMs) with robotic systems enables robots to process and understand complex natural language instructions and visual information. However, a fundamental challenge remains: for robots to fully capitalize on these advancements, they must have a deep understanding of their physical embodiment. The gap between AI models cognitive capabilities and the understanding of physical embodiment leads to the following question: Can a robot autonomously understand and adapt to its physical form and functionalities through interaction with its environment? This question underscores the transition towards developing self-modeling robots without reliance on external sensory or pre-programmed knowledge about their structure. Here, we propose a meta self modeling that can deduce robot morphology through proprioception (the internal sense of position and movement). Our study introduces a 12 DoF reconfigurable legged robot, accompanied by a diverse dataset of 200k unique configurations, to systematically investigate the relationship between robotic motion and robot morphology. Utilizing a deep neural network model comprising a robot signature encoder and a configuration decoder, we demonstrate the capability of our system to accurately predict robot configurations from proprioceptive signals. This research contributes to the field of robotic self-modeling, aiming to enhance understanding of their physical embodiment and adaptability in real world scenarios.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Towards End-to-End Structure Solutions from Information-Compromised Diffraction Data via Generative Deep Learning
Authors:
Gabe Guo,
Judah Goldfeder,
Ling Lan,
Aniv Ray,
Albert Hanming Yang,
Boyuan Chen,
Simon JL Billinge,
Hod Lipson
Abstract:
The revolution in materials in the past century was built on a knowledge of the atomic arrangements and the structure-property relationship. The sine qua non for obtaining quantitative structural information is single crystal crystallography. However, increasingly we need to solve structures in cases where the information content in our input signal is significantly degraded, for example, due to o…
▽ More
The revolution in materials in the past century was built on a knowledge of the atomic arrangements and the structure-property relationship. The sine qua non for obtaining quantitative structural information is single crystal crystallography. However, increasingly we need to solve structures in cases where the information content in our input signal is significantly degraded, for example, due to orientational averaging of grains, finite size effects due to nanostructure, and mixed signals due to sample heterogeneity. Understanding the structure property relationships in such situations is, if anything, more important and insightful, yet we do not have robust approaches for accomplishing it. In principle, machine learning (ML) and deep learning (DL) are promising approaches since they augment information in the degraded input signal with prior knowledge learned from large databases of already known structures. Here we present a novel ML approach, a variational query-based multi-branch deep neural network that has the promise to be a robust but general tool to address this problem end-to-end. We demonstrate the approach on computed powder x-ray diffraction (PXRD), along with partial chemical composition information, as input. We choose as a structural representation a modified electron density we call the Cartesian mapped electron density (CMED), that straightforwardly allows our ML model to learn material structures across different chemistries, symmetries and crystal systems. When evaluated on theoretically simulated data for the cubic and trigonal crystal systems, the system achieves up to $93.4\%$ average similarity with the ground truth on unseen materials, both with known and partially-known chemical composition information, showing great promise for successful structure solution even from degraded and incomplete input data.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
Assessing SATNet's Ability to Solve the Symbol Grounding Problem
Authors:
Oscar Chang,
Lampros Flokas,
Hod Lipson,
Michael Spranger
Abstract:
SATNet is an award-winning MAXSAT solver that can be used to infer logical rules and integrated as a differentiable layer in a deep neural network. It had been shown to solve Sudoku puzzles visually from examples of puzzle digit images, and was heralded as an impressive achievement towards the longstanding AI goal of combining pattern recognition with logical reasoning. In this paper, we clarify S…
▽ More
SATNet is an award-winning MAXSAT solver that can be used to infer logical rules and integrated as a differentiable layer in a deep neural network. It had been shown to solve Sudoku puzzles visually from examples of puzzle digit images, and was heralded as an impressive achievement towards the longstanding AI goal of combining pattern recognition with logical reasoning. In this paper, we clarify SATNet's capabilities by showing that in the absence of intermediate labels that identify individual Sudoku digit images with their logical representations, SATNet completely fails at visual Sudoku (0% test accuracy). More generally, the failure can be pinpointed to its inability to learn to assign symbols to perceptual phenomena, also known as the symbol grounding problem, which has long been thought to be a prerequisite for intelligent agents to perform real-world logical reasoning. We propose an MNIST based test as an easy instance of the symbol grounding problem that can serve as a sanity check for differentiable symbolic solvers in general. Naive applications of SATNet on this test lead to performance worse than that of models without logical reasoning capabilities. We report on the causes of SATNet's failure and how to prevent them.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
Balanced and Deterministic Weight-sharing Helps Network Performance
Authors:
Oscar Chang,
Hod Lipson
Abstract:
Weight-sharing plays a significant role in the success of many deep neural networks, by increasing memory efficiency and incorporating useful inductive priors about the problem into the network. But understanding how weight-sharing can be used effectively in general is a topic that has not been studied extensively. Chen et al. [2015] proposed HashedNets, which augments a multi-layer perceptron wit…
▽ More
Weight-sharing plays a significant role in the success of many deep neural networks, by increasing memory efficiency and incorporating useful inductive priors about the problem into the network. But understanding how weight-sharing can be used effectively in general is a topic that has not been studied extensively. Chen et al. [2015] proposed HashedNets, which augments a multi-layer perceptron with a hash table, as a method for neural network compression. We generalize this method into a framework (ArbNets) that allows for efficient arbitrary weight-sharing, and use it to study the role of weight-sharing in neural networks. We show that common neural networks can be expressed as ArbNets with different hash functions. We also present two novel hash functions, the Dirichlet hash and the Neighborhood hash, and use them to demonstrate experimentally that balanced and deterministic weight-sharing helps with the performance of a neural network.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
Principled Weight Initialization for Hypernetworks
Authors:
Oscar Chang,
Lampros Flokas,
Hod Lipson
Abstract:
Hypernetworks are meta neural networks that generate weights for a main neural network in an end-to-end differentiable manner. Despite extensive applications ranging from multi-task learning to Bayesian deep learning, the problem of optimizing hypernetworks has not been studied to date. We observe that classical weight initialization methods like Glorot & Bengio (2010) and He et al. (2015), when a…
▽ More
Hypernetworks are meta neural networks that generate weights for a main neural network in an end-to-end differentiable manner. Despite extensive applications ranging from multi-task learning to Bayesian deep learning, the problem of optimizing hypernetworks has not been studied to date. We observe that classical weight initialization methods like Glorot & Bengio (2010) and He et al. (2015), when applied directly on a hypernet, fail to produce weights for the mainnet in the correct scale. We develop principled techniques for weight initialization in hypernets, and show that they lead to more stable mainnet weights, lower training loss, and faster convergence.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
Accelerating Meta-Learning by Sharing Gradients
Authors:
Oscar Chang,
Hod Lipson
Abstract:
The success of gradient-based meta-learning is primarily attributed to its ability to leverage related tasks to learn task-invariant information. However, the absence of interactions between different tasks in the inner loop leads to task-specific over-fitting in the initial phase of meta-training. While this is eventually corrected by the presence of these interactions in the outer loop, it comes…
▽ More
The success of gradient-based meta-learning is primarily attributed to its ability to leverage related tasks to learn task-invariant information. However, the absence of interactions between different tasks in the inner loop leads to task-specific over-fitting in the initial phase of meta-training. While this is eventually corrected by the presence of these interactions in the outer loop, it comes at a significant cost of slower meta-learning. To address this limitation, we explicitly encode task relatedness via an inner loop regularization mechanism inspired by multi-task learning. Our algorithm shares gradient information from previously encountered tasks as well as concurrent tasks in the same task batch, and scales their contribution with meta-learned parameters. We show using two popular few-shot classification datasets that gradient sharing enables meta-learning under bigger inner loop learning rates and can accelerate the meta-training process by up to 134%.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
Teaching Robots to Build Simulations of Themselves
Authors:
Yuhang Hu,
Jiong Lin,
Hod Lipson
Abstract:
Simulation enables robots to plan and estimate the outcomes of prospective actions without the need to physically execute them. We introduce a self-supervised learning framework to enable robots model and predict their morphology, kinematics and motor control using only brief raw video data, eliminating the need for extensive real-world data collection and kinematic priors. By observing their own…
▽ More
Simulation enables robots to plan and estimate the outcomes of prospective actions without the need to physically execute them. We introduce a self-supervised learning framework to enable robots model and predict their morphology, kinematics and motor control using only brief raw video data, eliminating the need for extensive real-world data collection and kinematic priors. By observing their own movements, akin to humans watching their reflection in a mirror, robots learn an ability to simulate themselves and predict their spatial motion for various tasks. Our results demonstrate that this self-learned simulation not only enables accurate motion planning but also allows the robot to detect abnormalities and recover from damage.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
CarbonFish -- A Bistable Underactuated Compliant Fish Robot capable of High Frequency Undulation
Authors:
Zechen Xiong,
Zihan Guo,
Mark Liu,
Jialong Ning,
Hod Lipson
Abstract:
The Hair Clip Mechanism HCM represents an innovative in plane prestressed bistable mechanism, as delineated in our preceding studies, devised to augment the functional prowess of soft robotics. When juxtaposed with conventional soft and compliant robotic systems, HCMs exhibit pronounced rigidity, augmented mobility, reproducible repeatability, and an effective design and fabrication paradigm. In t…
▽ More
The Hair Clip Mechanism HCM represents an innovative in plane prestressed bistable mechanism, as delineated in our preceding studies, devised to augment the functional prowess of soft robotics. When juxtaposed with conventional soft and compliant robotic systems, HCMs exhibit pronounced rigidity, augmented mobility, reproducible repeatability, and an effective design and fabrication paradigm. In this research, we investigate the feasibility of utilizing carbon fiber reinforced plastic CFRP as the foundational material for an HCM based fish robot, herein referred to as CarbonFish. Our objective centers on realizing high frequency undulatory motion, thereby laying the groundwork for accelerated aquatic locomotion in subsequent models. We proffer an exhaustive design and fabrication schema underpinned by mathematical principles. Preliminary evaluations of our single actuated CarbonFish have evidenced an undulation frequency approaching 10 Hz, suggesting its potential to outperform other biologically inspired aquatic entities as well as real fish.
△ Less
Submitted 13 October, 2024; v1 submitted 6 November, 2023;
originally announced November 2023.
-
Designing a Hair-Clip Inspired Bistable Mechanism for Soft Fish Robots
Authors:
Zechen Xiong,
Hod Lipson
Abstract:
The Hair clip mechanism (HCM) is an in-plane prestressed bistable mechanism proposed in our previous research [1]~[5] to enhance the functionality of soft robotics. HCMs have several advantages, such as high rigidity, high mobility, good repeatability, and design and fabrication simplicity, compared to existing soft and compliant robotics. Using our experience with fish robots, this work delves in…
▽ More
The Hair clip mechanism (HCM) is an in-plane prestressed bistable mechanism proposed in our previous research [1]~[5] to enhance the functionality of soft robotics. HCMs have several advantages, such as high rigidity, high mobility, good repeatability, and design and fabrication simplicity, compared to existing soft and compliant robotics. Using our experience with fish robots, this work delves into designing a novel HCM robotic propulsion system made from PETG plastic, carbon fiber-reinforced plastic (CFRP), and steel. Detailed derivation and verification of the HCM theory are given, and the influence of key parameters like dimensions, material types, and servo motor specifications are summarized. The designing algorithm offers insight into HCM robotics. It enables us to search for suitable components, operate robots at a desired frequency, and achieve high-frequency and high-speed undulatory swimming for fish robots.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
Knolling bot 2.0: Enhancing Object Organization with Self-supervised Graspability Estimation
Authors:
Yuhang Hu,
Zhizhuo Zhang,
Hod Lipson
Abstract:
Building on recent advancements in transformer based approaches for domestic robots performing knolling, the art of organizing scattered items into neat arrangements. This paper introduces Knolling bot 2.0. Recognizing the challenges posed by piles of objects or items situated closely together, this upgraded system incorporates a self-supervised graspability estimation model. If objects are deemed…
▽ More
Building on recent advancements in transformer based approaches for domestic robots performing knolling, the art of organizing scattered items into neat arrangements. This paper introduces Knolling bot 2.0. Recognizing the challenges posed by piles of objects or items situated closely together, this upgraded system incorporates a self-supervised graspability estimation model. If objects are deemed ungraspable, an additional behavior will be executed to separate the objects before knolling the table. By integrating this grasp prediction mechanism with existing visual perception and transformer based knolling models, an advanced system capable of decluttering and organizing even more complex and densely populated table settings is demonstrated. Experimental evaluations demonstrate the effectiveness of this module, yielding a graspability prediction accuracy of 95.7%.
△ Less
Submitted 29 October, 2023;
originally announced October 2023.
-
Accelerating Aquatic Soft Robots with Elastic Instability Effects
Authors:
Zechen Xiong,
Suyu Luohong,
Jeong Hun Lee,
Hod Lipson
Abstract:
Sinusoidal undulation has long been considered the most successful swimming pattern for fish and bionic aquatic robots [1]. However, a swimming pattern generated by the hair clip mechanism (HCM, part iii, Figure 1A) [2]~[5] may challenge this knowledge. HCM is an in-plane prestressed bi-stable mechanism that stores elastic energy and releases the stored energy quickly via its snap-through buckling…
▽ More
Sinusoidal undulation has long been considered the most successful swimming pattern for fish and bionic aquatic robots [1]. However, a swimming pattern generated by the hair clip mechanism (HCM, part iii, Figure 1A) [2]~[5] may challenge this knowledge. HCM is an in-plane prestressed bi-stable mechanism that stores elastic energy and releases the stored energy quickly via its snap-through buckling. When used for fish robots, the HCM functions as the fish body and creates unique swimming patterns that we term HCM undulation. With the same energy consumption [3], HCM fish outperforms the traditionally designed soft fish with a two-fold increase in cruising speed. We reproduce this phenomenon in a single-link simulation with Aquarium [6]. HCM undulation generates an average propulsion of 16.7 N/m, 2-3 times larger than the reference undulation (6.78 N/m), sine pattern (5.34 N/m/s), and cambering sine pattern (6.36 N/m), and achieves an efficiency close to the sine pattern. These results can aid in developing fish robots and faster swimming patterns.
△ Less
Submitted 15 July, 2024; v1 submitted 21 October, 2023;
originally announced October 2023.
-
Knolling Bot: Learning Robotic Object Arrangement from Tidy Demonstrations
Authors:
Yuhang Hu,
Zhizhuo Zhang,
Xinyue Zhu,
Ruibo Liu,
Philippe Wyder,
Hod Lipson
Abstract:
Addressing the challenge of organizing scattered items in domestic spaces is complicated by the diversity and subjective nature of tidiness. Just as the complexity of human language allows for multiple expressions of the same idea, household tidiness preferences and organizational patterns vary widely, so presetting object locations would limit the adaptability to new objects and environments. Ins…
▽ More
Addressing the challenge of organizing scattered items in domestic spaces is complicated by the diversity and subjective nature of tidiness. Just as the complexity of human language allows for multiple expressions of the same idea, household tidiness preferences and organizational patterns vary widely, so presetting object locations would limit the adaptability to new objects and environments. Inspired by advancements in natural language processing (NLP), this paper introduces a self-supervised learning framework that allows robots to understand and replicate the concept of tidiness from demonstrations of well-organized layouts, akin to using conversational datasets to train Large Language Models(LLM). We leverage a transformer neural network to predict the placement of subsequent objects. We demonstrate a ``knolling'' system with a robotic arm and an RGB camera to organize items of varying sizes and quantities on a table. Our method not only trains a generalizable concept of tidiness, enabling the model to provide diverse solutions and adapt to different numbers of objects, but it can also incorporate human preferences to generate customized tidy tables without explicit target positions for each object.
△ Less
Submitted 15 March, 2024; v1 submitted 6 October, 2023;
originally announced October 2023.
-
High-Degrees-of-Freedom Dynamic Neural Fields for Robot Self-Modeling and Motion Planning
Authors:
Lennart Schulze,
Hod Lipson
Abstract:
A robot self-model is a task-agnostic representation of the robot's physical morphology that can be used for motion planning tasks in the absence of a classical geometric kinematic model. In particular, when the latter is hard to engineer or the robot's kinematics change unexpectedly, human-free self-modeling is a necessary feature of truly autonomous agents. In this work, we leverage neural field…
▽ More
A robot self-model is a task-agnostic representation of the robot's physical morphology that can be used for motion planning tasks in the absence of a classical geometric kinematic model. In particular, when the latter is hard to engineer or the robot's kinematics change unexpectedly, human-free self-modeling is a necessary feature of truly autonomous agents. In this work, we leverage neural fields to allow a robot to self-model its kinematics as a neural-implicit query model learned only from 2D images annotated with camera poses and configurations. This enables significantly greater applicability than existing approaches which have been dependent on depth images or geometry knowledge. To this end, alongside a curricular data sampling strategy, we propose a new encoder-based neural density field architecture for dynamic object-centric scenes conditioned on high numbers of degrees of freedom (DOFs). In a 7-DOF robot test setup, the learned self-model achieves a Chamfer-L2 distance of 2% of the robot's workspace dimension. We demonstrate the capabilities of this model on motion planning tasks as an exemplary downstream application.
△ Less
Submitted 18 April, 2024; v1 submitted 5 October, 2023;
originally announced October 2023.
-
DeepCollide: Scalable Data-Driven High DoF Configuration Space Modeling using Implicit Neural Representations
Authors:
Gabriel Guo,
Judah Goldfeder,
Aniv Ray,
Tony Dear,
Hod Lipson
Abstract:
Collision detection is essential to virtually all robotics applications. However, traditional geometric collision detection methods generally require pre-existing workspace geometry representations; thus, they are unable to infer the collision detection function from sampled data when geometric information is unavailable. Learning-based approaches can overcome this limitation. Following this line…
▽ More
Collision detection is essential to virtually all robotics applications. However, traditional geometric collision detection methods generally require pre-existing workspace geometry representations; thus, they are unable to infer the collision detection function from sampled data when geometric information is unavailable. Learning-based approaches can overcome this limitation. Following this line of research, we present DeepCollide, an implicit neural representation method for approximating the collision detection function from sampled collision data. As shown by our theoretical analysis and empirical evidence, DeepCollide presents clear benefits over the state-of-the-art, as it relates to time cost scalability with respect to training data and DoF, as well as the ability to accurately express complex workspace geometries. We publicly release our code.
△ Less
Submitted 13 September, 2023; v1 submitted 24 May, 2023;
originally announced May 2023.
-
Direct Robot Configuration Space Construction using Convolutional Encoder-Decoders
Authors:
Christopher Benka,
Carl Gross,
Riya Gupta,
Hod Lipson
Abstract:
Intelligent robots must be able to perform safe and efficient motion planning in their environments. Central to modern motion planning is the configuration space. Configuration spaces define the set of configurations of a robot that result in collisions with obstacles in the workspace, C-clsn, and the set of configurations that do not, C-free. Modern approaches to motion planning first compute the…
▽ More
Intelligent robots must be able to perform safe and efficient motion planning in their environments. Central to modern motion planning is the configuration space. Configuration spaces define the set of configurations of a robot that result in collisions with obstacles in the workspace, C-clsn, and the set of configurations that do not, C-free. Modern approaches to motion planning first compute the configuration space and then perform motion planning using the calculated configuration space. Real-time motion planning requires accurate and efficient construction of configuration spaces.
We are the first to apply a convolutional encoder-decoder framework for calculating highly accurate approximations to configuration spaces. Our model achieves an average 97.5% F1-score for predicting C-free and C-clsn for 2-D robotic workspaces with a dual-arm robot. Our method limits undetected collisions to less than 2.5% on robotic workspaces that involve translation, rotation, and removal of obstacles. Our model learns highly transferable features between robotic workspaces, requiring little to no fine-tuning to adapt to new transformations of obstacles in the workspace.
△ Less
Submitted 9 March, 2023;
originally announced March 2023.
-
Rapid grasping of fabric using bionic soft grippers with elastic instability
Authors:
Zechen Xiong,
Zihan Guo,
Li Yuan,
Yufeng Su,
Yitong Liu,
Hod Lipson
Abstract:
Robot grasping is subject to an inherent tradeoff: Grippers with a large span typically take a longer time to close, and fast grippers usually cover a small span. However, many practical applications of soft grippers require the ability to close a large distance rapidly. For example, grasping cloth typically requires pressing a wide span of fabric into a graspable cusp. Here, we demonstrate a huma…
▽ More
Robot grasping is subject to an inherent tradeoff: Grippers with a large span typically take a longer time to close, and fast grippers usually cover a small span. However, many practical applications of soft grippers require the ability to close a large distance rapidly. For example, grasping cloth typically requires pressing a wide span of fabric into a graspable cusp. Here, we demonstrate a human-finger-inspired snapping gripper that exploits elastic instability to achieve reversible rapid closure over a wide span. Using prestressed semi-rigid material as the skeleton, the gripper fingers can widely open (86 ~) and rapidly close (46 ms) following a trajectory similar to that of a thumb-index finger pinching which is 2.7 times and 10.9 times better than the reference gripper in terms of span and speed, respectively. We theoretically give the design principle, simulatively verify the method, and experimentally test this gripper on a variety of rigid, flexible, and limp objects and achieve good adaptivity and mechanical performance. This research helps bridge the gap between strong industry manipulators and safe human-interactive robotic hands.
△ Less
Submitted 1 October, 2023; v1 submitted 23 January, 2023;
originally announced January 2023.
-
Fast Untethered Soft Robotic Crawler with Elastic Instability
Authors:
Zechen Xiong,
Yufeng Su,
Hod Lipson
Abstract:
High-speed locomotion of animals gives them tremendous advantages in exploring, hunting, and escaping from predators in varying environments. Enlightened by the fast-running gait of mammals like cheetahs and wolves, we designed and fabricated a single-servo-driving untethered soft robot that is capable of galloping at a speed of 313 mm/s or 1.56 body length per second (BL/s), 5.2 times and 2.6 tim…
▽ More
High-speed locomotion of animals gives them tremendous advantages in exploring, hunting, and escaping from predators in varying environments. Enlightened by the fast-running gait of mammals like cheetahs and wolves, we designed and fabricated a single-servo-driving untethered soft robot that is capable of galloping at a speed of 313 mm/s or 1.56 body length per second (BL/s), 5.2 times and 2.6 times faster than the reported fastest predecessors in mm/s and BL/s, respectively, in literature. An in-plane prestressed hair clip mechanism (HCM) made up of semi-rigid materials like plastic is used as the supporting chassis, the compliant spine, and the muscle force amplifier of the robot at the same time, enabling the robot to be rapid and strong. The influence of factors including actuation frequency, substrates, tethering/untethering, and symmetric/asymmetric actuation is explored with experiments. Based on previous work, this paper further demonstrated the potential of HCM in addressing the speed problem of soft robots.
△ Less
Submitted 14 August, 2023; v1 submitted 5 October, 2022;
originally announced October 2022.
-
On the Origins of Self-Modeling
Authors:
Robert Kwiatkowski,
Yuhang Hu,
Boyuan Chen,
Hod Lipson
Abstract:
Self-Modeling is the process by which an agent, such as an animal or machine, learns to create a predictive model of its own dynamics. Once captured, this self-model can then allow the agent to plan and evaluate various potential behaviors internally using the self-model, rather than using costly physical experimentation. Here, we quantify the benefits of such self-modeling against the complexity…
▽ More
Self-Modeling is the process by which an agent, such as an animal or machine, learns to create a predictive model of its own dynamics. Once captured, this self-model can then allow the agent to plan and evaluate various potential behaviors internally using the self-model, rather than using costly physical experimentation. Here, we quantify the benefits of such self-modeling against the complexity of the robot. We find a R2 =0.90 correlation between the number of degrees of freedom a robot has, and the added value of self-modeling as compared to a direct learning baseline. This result may help motivate self modeling in increasingly complex robotic systems, as well as shed light on the origins of self-modeling, and ultimately self-awareness, in animals and humans.
△ Less
Submitted 5 September, 2022;
originally announced September 2022.
-
A Massively-Parallel 3D Simulator for Soft and Hybrid Robots
Authors:
Joel Clay,
Sofia Wyetzner,
Alex Gaudio,
Boxi Xia,
Andrew Moshova,
Jacob Austin,
Max Segan,
Hod Lipson
Abstract:
Simulation is an important step in robotics for creating control policies and testing various physical parameters. Soft robotics is a field that presents unique physical challenges for simulating its subjects due to the nonlinearity of deformable material components along with other innovative, and often complex, physical properties. Because of the computational cost of simulating soft and heterog…
▽ More
Simulation is an important step in robotics for creating control policies and testing various physical parameters. Soft robotics is a field that presents unique physical challenges for simulating its subjects due to the nonlinearity of deformable material components along with other innovative, and often complex, physical properties. Because of the computational cost of simulating soft and heterogeneous objects with traditional techniques, rigid robotics simulators are not well suited to simulating soft robots. Thus, many engineers must build their own one-off simulators tailored to their system, or use existing simulators with reduced performance. In order to facilitate the development of this exciting technology, this work presents an interactive-speed, accurate, and versatile simulator for a variety of types of soft robots. Cronos, our open-source 3D simulation engine, parallelizes a mass-spring model for ultra-fast performance on both deformable and rigid objects. Our approach is applicable to a wide array of nonlinear material configurations, including high deformability, volumetric actuation, or heterogenous stiffness. This versatility provides the ability to mix materials and geometric components freely within a single robot simulation. By exploiting the flexibility and scalability of nonlinear Hookean mass-spring systems, this framework simulates soft and rigid objects via a highly parallel model for near real-time speed. We describe an efficient GPU CUDA implementation, which we demonstrate to achieve computation of over 1 billion elements per second on consumer-grade GPU cards. Dynamic physical accuracy of the system is validated by comparing results to Euler-Bernoulli beam theory, natural frequency predictions, and empirical data of a soft structure under large deformation.
△ Less
Submitted 19 July, 2022;
originally announced July 2022.
-
Fast Swimming Robots Based on Elastic Instability
Authors:
Zechen Xiong,
Liqi Chen,
Wenxiong Hao,
Yufeng Su,
Hod Lipson
Abstract:
Inspired by the snap-through action of a steel hairclip, we propose a design method for in-plane prestressed mechanisms that exhibit biomimetic morphing and high locomotion performance. Compliant bistable flapping mechanisms are fabricated using this method and are mounted on our untethered soft robotic fish. Using this mechanism, we achieve life-like undulation with a Strouhal number (1) of St =…
▽ More
Inspired by the snap-through action of a steel hairclip, we propose a design method for in-plane prestressed mechanisms that exhibit biomimetic morphing and high locomotion performance. Compliant bistable flapping mechanisms are fabricated using this method and are mounted on our untethered soft robotic fish. Using this mechanism, we achieve life-like undulation with a Strouhal number (1) of St = 0.28 and a velocity of 2.03 body lengths per second (43.6 cm/s), a three-fold improvement over past compliant fish robots. A tethered pneumatic version indicates that this mechanism is compatible with soft actuators. We study the mechanism both computationally and experimentally and suggest that elastic instability may offer a path to overcome the speed challenge of soft and compliant robots.
△ Less
Submitted 6 November, 2023; v1 submitted 17 July, 2022;
originally announced July 2022.
-
Egocentric Visual Self-Modeling for Autonomous Robot Dynamics Prediction and Adaptation
Authors:
Yuhang Hu,
Boyuan Chen,
Hod Lipson
Abstract:
The ability of robots to model their own dynamics is key to autonomous planning and learning, as well as for autonomous damage detection and recovery. Traditionally, dynamic models are pre-programmed or learned from external observations. Here, we demonstrate for the first time how a task-agnostic dynamic self-model can be learned using only a single first-person-view camera in a self-supervised m…
▽ More
The ability of robots to model their own dynamics is key to autonomous planning and learning, as well as for autonomous damage detection and recovery. Traditionally, dynamic models are pre-programmed or learned from external observations. Here, we demonstrate for the first time how a task-agnostic dynamic self-model can be learned using only a single first-person-view camera in a self-supervised manner, without any prior knowledge of robot morphology, kinematics, or task. Through experiments on a 12-DoF robot, we demonstrate the capabilities of the model in basic locomotion tasks using visual input. Notably, the robot can autonomously detect anomalies, such as damaged components, and adapt its behavior, showcasing resilience in dynamic environments. Furthermore, the model's generalizability was validated across robots with different configurations, emphasizing its potential as a universal tool for diverse robotic systems. The egocentric visual self-model proposed in our work paves the way for more autonomous, adaptable, and resilient robotic systems.
△ Less
Submitted 15 March, 2024; v1 submitted 7 July, 2022;
originally announced July 2022.
-
Discovering State Variables Hidden in Experimental Data
Authors:
Boyuan Chen,
Kuang Huang,
Sunand Raghupathi,
Ishaan Chandratreya,
Qiang Du,
Hod Lipson
Abstract:
All physical laws are described as relationships between state variables that give a complete and non-redundant description of the relevant system dynamics. However, despite the prevalence of computing power and AI, the process of identifying the hidden state variables themselves has resisted automation. Most data-driven methods for modeling physical phenomena still assume that observed data strea…
▽ More
All physical laws are described as relationships between state variables that give a complete and non-redundant description of the relevant system dynamics. However, despite the prevalence of computing power and AI, the process of identifying the hidden state variables themselves has resisted automation. Most data-driven methods for modeling physical phenomena still assume that observed data streams already correspond to relevant state variables. A key challenge is to identify the possible sets of state variables from scratch, given only high-dimensional observational data. Here we propose a new principle for determining how many state variables an observed system is likely to have, and what these variables might be, directly from video streams. We demonstrate the effectiveness of this approach using video recordings of a variety of physical dynamical systems, ranging from elastic double pendulums to fire flames. Without any prior knowledge of the underlying physics, our algorithm discovers the intrinsic dimension of the observed dynamics and identifies candidate sets of state variables. We suggest that this approach could help catalyze the understanding, prediction and control of increasingly complex systems. Project website is at: https://www.cs.columbia.edu/~bchen/neural-state-variables
△ Less
Submitted 20 December, 2021;
originally announced December 2021.
-
Visual design intuition: Predicting dynamic properties of beams from raw cross-section images
Authors:
Philippe M. Wyder,
Hod Lipson
Abstract:
In this work we aim to mimic the human ability to acquire the intuition to estimate the performance of a design from visual inspection and experience alone. We study the ability of convolutional neural networks to predict static and dynamic properties of cantilever beams directly from their raw cross-section images. Using pixels as the only input, the resulting models learn to predict beam propert…
▽ More
In this work we aim to mimic the human ability to acquire the intuition to estimate the performance of a design from visual inspection and experience alone. We study the ability of convolutional neural networks to predict static and dynamic properties of cantilever beams directly from their raw cross-section images. Using pixels as the only input, the resulting models learn to predict beam properties such as volume maximum deflection and eigenfrequencies with 4.54% and 1.43% Mean Average Percentage Error (MAPE) respectively, compared to the Finite Element Analysis (FEA) approach. Training these models doesn't require prior knowledge of theory or relevant geometric properties, but rather relies solely on simulated or empirical data, thereby making predictions based on "experience" as opposed to theoretical knowledge. Since this approach is over 1000 times faster than FEA, it can be adopted to create surrogate models that could speed up the preliminary optimization studies where numerous consecutive evaluations of similar geometries are required. We suggest that this modeling approach would aid in addressing challenging optimization problems involving complex structures and physical phenomena for which theoretical models are unavailable.
△ Less
Submitted 13 November, 2021;
originally announced November 2021.
-
Full-Body Visual Self-Modeling of Robot Morphologies
Authors:
Boyuan Chen,
Robert Kwiatkowski,
Carl Vondrick,
Hod Lipson
Abstract:
Internal computational models of physical bodies are fundamental to the ability of robots and animals alike to plan and control their actions. These "self-models" allow robots to consider outcomes of multiple possible future actions, without trying them out in physical reality. Recent progress in fully data-driven self-modeling has enabled machines to learn their own forward kinematics directly fr…
▽ More
Internal computational models of physical bodies are fundamental to the ability of robots and animals alike to plan and control their actions. These "self-models" allow robots to consider outcomes of multiple possible future actions, without trying them out in physical reality. Recent progress in fully data-driven self-modeling has enabled machines to learn their own forward kinematics directly from task-agnostic interaction data. However, forward-kinema\-tics models can only predict limited aspects of the morphology, such as the position of end effectors or velocity of joints and masses. A key challenge is to model the entire morphology and kinematics, without prior knowledge of what aspects of the morphology will be relevant to future tasks. Here, we propose that instead of directly modeling forward-kinematics, a more useful form of self-modeling is one that could answer space occupancy queries, conditioned on the robot's state. Such query-driven self models are continuous in the spatial domain, memory efficient, fully differentiable and kinematic aware. In physical experiments, we demonstrate how a visual self-model is accurate to about one percent of the workspace, enabling the robot to perform various motion planning and control tasks. Visual self-modeling can also allow the robot to detect, localize and recover from real-world damage, leading to improved machine resiliency. Our project website is at: https://robot-morphology.cs.columbia.edu/
△ Less
Submitted 21 November, 2021; v1 submitted 11 November, 2021;
originally announced November 2021.
-
Smile Like You Mean It: Driving Animatronic Robotic Face with Learned Models
Authors:
Boyuan Chen,
Yuhang Hu,
Lianfeng Li,
Sara Cummings,
Hod Lipson
Abstract:
Ability to generate intelligent and generalizable facial expressions is essential for building human-like social robots. At present, progress in this field is hindered by the fact that each facial expression needs to be programmed by humans. In order to adapt robot behavior in real time to different situations that arise when interacting with human subjects, robots need to be able to train themsel…
▽ More
Ability to generate intelligent and generalizable facial expressions is essential for building human-like social robots. At present, progress in this field is hindered by the fact that each facial expression needs to be programmed by humans. In order to adapt robot behavior in real time to different situations that arise when interacting with human subjects, robots need to be able to train themselves without requiring human labels, as well as make fast action decisions and generalize the acquired knowledge to diverse and new contexts. We addressed this challenge by designing a physical animatronic robotic face with soft skin and by developing a vision-based self-supervised learning framework for facial mimicry. Our algorithm does not require any knowledge of the robot's kinematic model, camera calibration or predefined expression set. By decomposing the learning process into a generative model and an inverse model, our framework can be trained using a single motor babbling dataset. Comprehensive evaluations show that our method enables accurate and diverse face mimicry across diverse human subjects. The project website is at http://www.cs.columbia.edu/~bchen/aiface/
△ Less
Submitted 26 May, 2021;
originally announced May 2021.
-
The Boombox: Visual Reconstruction from Acoustic Vibrations
Authors:
Boyuan Chen,
Mia Chiquier,
Hod Lipson,
Carl Vondrick
Abstract:
Interacting with bins and containers is a fundamental task in robotics, making state estimation of the objects inside the bin critical. While robots often use cameras for state estimation, the visual modality is not always ideal due to occlusions and poor illumination. We introduce The Boombox, a container that uses sound to estimate the state of the contents inside a box. Based on the observation…
▽ More
Interacting with bins and containers is a fundamental task in robotics, making state estimation of the objects inside the bin critical. While robots often use cameras for state estimation, the visual modality is not always ideal due to occlusions and poor illumination. We introduce The Boombox, a container that uses sound to estimate the state of the contents inside a box. Based on the observation that the collision between objects and its containers will cause an acoustic vibration, we present a convolutional network for learning to reconstruct visual scenes. Although we use low-cost and low-power contact microphones to detect the vibrations, our results show that learning from multimodal data enables state estimation from affordable audio sensors. Due to the many ways that robots use containers, we believe the box will have a number of applications in robotics. Our project website is at: boombox.cs.columbia.edu
△ Less
Submitted 23 October, 2021; v1 submitted 17 May, 2021;
originally announced May 2021.
-
Visual Perspective Taking for Opponent Behavior Modeling
Authors:
Boyuan Chen,
Yuhang Hu,
Robert Kwiatkowski,
Shuran Song,
Hod Lipson
Abstract:
In order to engage in complex social interaction, humans learn at a young age to infer what others see and cannot see from a different point-of-view, and learn to predict others' plans and behaviors. These abilities have been mostly lacking in robots, sometimes making them appear awkward and socially inept. Here we propose an end-to-end long-term visual prediction framework for robots to begin to…
▽ More
In order to engage in complex social interaction, humans learn at a young age to infer what others see and cannot see from a different point-of-view, and learn to predict others' plans and behaviors. These abilities have been mostly lacking in robots, sometimes making them appear awkward and socially inept. Here we propose an end-to-end long-term visual prediction framework for robots to begin to acquire both these critical cognitive skills, known as Visual Perspective Taking (VPT) and Theory of Behavior (TOB). We demonstrate our approach in the context of visual hide-and-seek - a game that represents a cognitive milestone in human development. Unlike traditional visual predictive model that generates new frames from immediate past frames, our agent can directly predict to multiple future timestamps (25s), extrapolating by 175% beyond the training horizon. We suggest that visual behavior modeling and perspective taking skills will play a critical role in the ability of physical robots to fully integrate into real-world multi-agent activities. Our website is at http://www.cs.columbia.edu/~bchen/vpttob/.
△ Less
Submitted 11 May, 2021;
originally announced May 2021.
-
Beyond Categorical Label Representations for Image Classification
Authors:
Boyuan Chen,
Yu Li,
Sunand Raghupathi,
Hod Lipson
Abstract:
We find that the way we choose to represent data labels can have a profound effect on the quality of trained models. For example, training an image classifier to regress audio labels rather than traditional categorical probabilities produces a more reliable classification. This result is surprising, considering that audio labels are more complex than simpler numerical probabilities or text. We hyp…
▽ More
We find that the way we choose to represent data labels can have a profound effect on the quality of trained models. For example, training an image classifier to regress audio labels rather than traditional categorical probabilities produces a more reliable classification. This result is surprising, considering that audio labels are more complex than simpler numerical probabilities or text. We hypothesize that high dimensional, high entropy label representations are generally more useful because they provide a stronger error signal. We support this hypothesis with evidence from various label representations including constant matrices, spectrograms, shuffled spectrograms, Gaussian mixtures, and uniform random matrices of various dimensionalities. Our experiments reveal that high dimensional, high entropy labels achieve comparable accuracy to text (categorical) labels on the standard image classification task, but features learned through our label representations exhibit more robustness under various adversarial attacks and better effectiveness with a limited amount of training data. These results suggest that label representation may play a more important role than previously thought. The project website is at \url{https://www.creativemachineslab.com/label-representation.html}.
△ Less
Submitted 5 April, 2021;
originally announced April 2021.
-
A Legged Soft Robot Platform for Dynamic Locomotion
Authors:
Boxi Xia,
Jiaming Fu,
Hongbo Zhu,
Zhicheng Song,
Yibo Jiang,
Hod Lipson
Abstract:
We present an open-source untethered quadrupedal soft robot platform for dynamic locomotion (e.g., high-speed running and backflipping). The robot is mostly soft (80 vol.%) while driven by four geared servo motors. The robot's soft body and soft legs were 3D printed with gyroid infill using a flexible material, enabling it to conform to the environment and passively stabilize during locomotion on…
▽ More
We present an open-source untethered quadrupedal soft robot platform for dynamic locomotion (e.g., high-speed running and backflipping). The robot is mostly soft (80 vol.%) while driven by four geared servo motors. The robot's soft body and soft legs were 3D printed with gyroid infill using a flexible material, enabling it to conform to the environment and passively stabilize during locomotion on multi-terrain environments. In addition, we simulated the robot in a real-time soft body simulation. With tuned gaits in simulation, the real robot can locomote at a speed of 0.9 m/s (2.5 body length/second), substantially faster than most untethered legged soft robots published to date. We hope this platform, along with its verified simulator, can catalyze the development of soft robotics.
△ Less
Submitted 25 March, 2021; v1 submitted 12 November, 2020;
originally announced November 2020.
-
Titan: A Parallel Asynchronous Library for Multi-Agent and Soft-Body Robotics using NVIDIA CUDA
Authors:
Jacob Austin,
Rafael Corrales-Fatou,
Sofia Wyetzner,
Hod Lipson
Abstract:
While most robotics simulation libraries are built for low-dimensional and intrinsically serial tasks, soft-body and multi-agent robotics have created a demand for simulation environments that can model many interacting bodies in parallel. Despite the increasing interest in these fields, no existing simulation library addresses the challenge of providing a unified, highly-parallelized, GPU-acceler…
▽ More
While most robotics simulation libraries are built for low-dimensional and intrinsically serial tasks, soft-body and multi-agent robotics have created a demand for simulation environments that can model many interacting bodies in parallel. Despite the increasing interest in these fields, no existing simulation library addresses the challenge of providing a unified, highly-parallelized, GPU-accelerated interface for simulating large robotic systems. Titan is a versatile CUDA-based C++ robotics simulation library that employs a novel asynchronous computing model for GPU-accelerated simulations of robotics primitives. The innovative GPU architecture design permits simultaneous optimization and control on the CPU while the GPU runs asynchronously, enabling rapid topology optimization and reinforcement learning iterations. Kinematics are solved with a massively parallel integration scheme that incorporates constraints and environmental forces. We report dramatically improved performance over CPU-based baselines, simulating as many as 300 million primitive updates per second, while allowing flexibility for a wide range of research applications. We present several applications of Titan to high-performance simulations of soft-body and multi-agent robots.
△ Less
Submitted 22 November, 2019;
originally announced November 2019.
-
Visual Hide and Seek
Authors:
Boyuan Chen,
Shuran Song,
Hod Lipson,
Carl Vondrick
Abstract:
We train embodied agents to play Visual Hide and Seek where a prey must navigate in a simulated environment in order to avoid capture from a predator. We place a variety of obstacles in the environment for the prey to hide behind, and we only give the agents partial observations of their environment using an egocentric perspective. Although we train the model to play this game from scratch, experi…
▽ More
We train embodied agents to play Visual Hide and Seek where a prey must navigate in a simulated environment in order to avoid capture from a predator. We place a variety of obstacles in the environment for the prey to hide behind, and we only give the agents partial observations of their environment using an egocentric perspective. Although we train the model to play this game from scratch, experiments and visualizations suggest that the agent learns to predict its own visibility in the environment. Furthermore, we quantitatively analyze how agent weaknesses, such as slower speed, effect the learned policy. Our results suggest that, although agent weaknesses make the learning problem more challenging, they also cause more useful features to be learned. Our project website is available at: http://www.cs.columbia.edu/ ~bchen/visualhideseek/.
△ Less
Submitted 14 October, 2019;
originally announced October 2019.
-
Zero Shot Learning on Simulated Robots
Authors:
Robert Kwiatkowski,
Hod Lipson
Abstract:
In this work we present a method for leveraging data from one source to learn how to do multiple new tasks. Task transfer is achieved using a self-model that encapsulates the dynamics of a system and serves as an environment for reinforcement learning. To study this approach, we train a self-models on various robot morphologies, using randomly sampled actions. Using a self-model, an initial state…
▽ More
In this work we present a method for leveraging data from one source to learn how to do multiple new tasks. Task transfer is achieved using a self-model that encapsulates the dynamics of a system and serves as an environment for reinforcement learning. To study this approach, we train a self-models on various robot morphologies, using randomly sampled actions. Using a self-model, an initial state and corresponding actions, we can predict the next state. This predictive self-model is then used by a standard reinforcement learning algorithm to accomplish tasks without ever seeing a state from the "real" environment. These trained policies allow the robots to successfully achieve their goals in the "real" environment. We demonstrate that not only is training on the self-model far more data efficient than learning even a single task, but also that it allows for learning new tasks without necessitating any additional data collection, essentially allowing zero-shot learning of new tasks.
△ Less
Submitted 4 October, 2019;
originally announced October 2019.
-
Automated Weed Detection in Aerial Imagery with Context
Authors:
Delia Bullock,
Andrew Mangeni,
Tyr Wiesner-Hanks,
Chad DeChant,
Ethan L. Stewart,
Nicholas Kaczmar,
Judith M. Kolkman,
Rebecca J. Nelson,
Michael A. Gore,
Hod Lipson
Abstract:
In this paper, we demonstrate the ability to discriminate between cultivated maize plant and grass or grass-like weed image segments using the context surrounding the image segments. While convolutional neural networks have brought state of the art accuracies within object detection, errors arise when objects in different classes share similar features. This scenario often occurs when objects in i…
▽ More
In this paper, we demonstrate the ability to discriminate between cultivated maize plant and grass or grass-like weed image segments using the context surrounding the image segments. While convolutional neural networks have brought state of the art accuracies within object detection, errors arise when objects in different classes share similar features. This scenario often occurs when objects in images are viewed at too small of a scale to discern distinct differences in features, causing images to be incorrectly classified or localized. To solve this problem, we will explore using context when classifying image segments. This technique involves feeding a convolutional neural network a central square image along with a border of its direct surroundings at train and test times. This means that although images are labelled at a smaller scale to preserve accurate localization, the network classifies the images and learns features that include the wider context. We demonstrate the benefits of this context technique in the object detection task through a case study of grass (foxtail) and grass-like (yellow nutsedge) weed detection in maize fields. In this standard situation, adding context alone nearly halved the error of the neural network from 7.1% to 4.3%. After only one epoch with context, the network also achieved a higher accuracy than the network without context did after 50 epochs. The benefits of using the context technique are likely to particularly evident in agricultural contexts in which parts (such as leaves) of several plants may appear similar when not taking into account the context in which those parts appear.
△ Less
Submitted 19 November, 2019; v1 submitted 1 October, 2019;
originally announced October 2019.
-
Ensemble Model Patching: A Parameter-Efficient Variational Bayesian Neural Network
Authors:
Oscar Chang,
Yuling Yao,
David Williams-King,
Hod Lipson
Abstract:
Two main obstacles preventing the widespread adoption of variational Bayesian neural networks are the high parameter overhead that makes them infeasible on large networks, and the difficulty of implementation, which can be thought of as "programming overhead." MC dropout [Gal and Ghahramani, 2016] is popular because it sidesteps these obstacles. Nevertheless, dropout is often harmful to model perf…
▽ More
Two main obstacles preventing the widespread adoption of variational Bayesian neural networks are the high parameter overhead that makes them infeasible on large networks, and the difficulty of implementation, which can be thought of as "programming overhead." MC dropout [Gal and Ghahramani, 2016] is popular because it sidesteps these obstacles. Nevertheless, dropout is often harmful to model performance when used in networks with batch normalization layers [Li et al., 2018], which are an indispensable part of modern neural networks. We construct a general variational family for ensemble-based Bayesian neural networks that encompasses dropout as a special case. We further present two specific members of this family that work well with batch normalization layers, while retaining the benefits of low parameter and programming overhead, comparable to non-Bayesian training. Our proposed methods improve predictive accuracy and achieve almost perfect calibration on a ResNet-18 trained with ImageNet.
△ Less
Submitted 22 May, 2019;
originally announced May 2019.
-
Seven Myths in Machine Learning Research
Authors:
Oscar Chang,
Hod Lipson
Abstract:
We present seven myths commonly believed to be true in machine learning research, circa Feb 2019. This is an archival copy of the blog post at https://crazyoscarchang.github.io/2019/02/16/seven-myths-in-machine-learning-research/
Myth 1: TensorFlow is a Tensor manipulation library
Myth 2: Image datasets are representative of real images found in the wild
Myth 3: Machine Learning researchers…
▽ More
We present seven myths commonly believed to be true in machine learning research, circa Feb 2019. This is an archival copy of the blog post at https://crazyoscarchang.github.io/2019/02/16/seven-myths-in-machine-learning-research/
Myth 1: TensorFlow is a Tensor manipulation library
Myth 2: Image datasets are representative of real images found in the wild
Myth 3: Machine Learning researchers do not use the test set for validation
Myth 4: Every datapoint is used in training a neural network
Myth 5: We need (batch) normalization to train very deep residual networks
Myth 6: Attention $>$ Convolution
Myth 7: Saliency maps are robust ways to interpret neural networks
△ Less
Submitted 22 February, 2019; v1 submitted 18 February, 2019;
originally announced February 2019.
-
Agent Embeddings: A Latent Representation for Pole-Balancing Networks
Authors:
Oscar Chang,
Robert Kwiatkowski,
Siyuan Chen,
Hod Lipson
Abstract:
We show that it is possible to reduce a high-dimensional object like a neural network agent into a low-dimensional vector representation with semantic meaning that we call agent embeddings, akin to word or face embeddings. This can be done by collecting examples of existing networks, vectorizing their weights, and then learning a generative model over the weight space in a supervised fashion. We i…
▽ More
We show that it is possible to reduce a high-dimensional object like a neural network agent into a low-dimensional vector representation with semantic meaning that we call agent embeddings, akin to word or face embeddings. This can be done by collecting examples of existing networks, vectorizing their weights, and then learning a generative model over the weight space in a supervised fashion. We investigate a pole-balancing task, Cart-Pole, as a case study and show that multiple new pole-balancing networks can be generated from their agent embeddings without direct access to training data from the Cart-Pole simulator. In general, the learned embedding space is helpful for mapping out the space of solutions for a given task. We observe in the case of Cart-Pole the surprising finding that good agents make different decisions despite learning similar representations, whereas bad agents make similar (bad) decisions while learning dissimilar representations. Linearly interpolating between the latent embeddings for a good agent and a bad agent yields an agent embedding that generates a network with intermediate performance, where the performance can be tuned according to the coefficient of interpolation. Linear extrapolation in the latent space also results in performance boosts, up to a point.
△ Less
Submitted 18 March, 2019; v1 submitted 11 November, 2018;
originally announced November 2018.
-
Neural Network Quine
Authors:
Oscar Chang,
Hod Lipson
Abstract:
Self-replication is a key aspect of biological life that has been largely overlooked in Artificial Intelligence systems. Here we describe how to build and train self-replicating neural networks. The network replicates itself by learning to output its own weights. The network is designed using a loss function that can be optimized with either gradient-based or non-gradient-based methods. We also de…
▽ More
Self-replication is a key aspect of biological life that has been largely overlooked in Artificial Intelligence systems. Here we describe how to build and train self-replicating neural networks. The network replicates itself by learning to output its own weights. The network is designed using a loss function that can be optimized with either gradient-based or non-gradient-based methods. We also describe a method we call regeneration to train the network without explicit optimization, by injecting the network with predictions of its own parameters. The best solution for a self-replicating network was found by alternating between regeneration and optimization steps. Finally, we describe a design for a self-replicating neural network that can solve an auxiliary task such as MNIST image classification. We observe that there is a trade-off between the network's ability to classify images and its ability to replicate, but training is biased towards increasing its specialization at image classification at the expense of replication. This is analogous to the trade-off between reproduction and other tasks observed in nature. We suggest that a self-replication mechanism for artificial intelligence is useful because it introduces the possibility of continual improvement through natural selection.
△ Less
Submitted 24 May, 2018; v1 submitted 15 March, 2018;
originally announced March 2018.
-
The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities
Authors:
Joel Lehman,
Jeff Clune,
Dusan Misevic,
Christoph Adami,
Lee Altenberg,
Julie Beaulieu,
Peter J. Bentley,
Samuel Bernard,
Guillaume Beslon,
David M. Bryson,
Patryk Chrabaszcz,
Nick Cheney,
Antoine Cully,
Stephane Doncieux,
Fred C. Dyer,
Kai Olav Ellefsen,
Robert Feldt,
Stephan Fischer,
Stephanie Forrest,
Antoine Frénoy,
Christian Gagné,
Leni Le Goff,
Laura M. Grabowski,
Babak Hodjat,
Frank Hutter
, et al. (28 additional authors not shown)
Abstract:
Biological evolution provides a creative fount of complex and subtle adaptations, often surprising the scientists who discover them. However, because evolution is an algorithmic process that transcends the substrate in which it occurs, evolution's creativity is not limited to nature. Indeed, many researchers in the field of digital evolution have observed their evolving algorithms and organisms su…
▽ More
Biological evolution provides a creative fount of complex and subtle adaptations, often surprising the scientists who discover them. However, because evolution is an algorithmic process that transcends the substrate in which it occurs, evolution's creativity is not limited to nature. Indeed, many researchers in the field of digital evolution have observed their evolving algorithms and organisms subverting their intentions, exposing unrecognized bugs in their code, producing unexpected adaptations, or exhibiting outcomes uncannily convergent with ones in nature. Such stories routinely reveal creativity by evolution in these digital worlds, but they rarely fit into the standard scientific narrative. Instead they are often treated as mere obstacles to be overcome, rather than results that warrant study in their own right. The stories themselves are traded among researchers through oral tradition, but that mode of information transmission is inefficient and prone to error and outright loss. Moreover, the fact that these stories tend to be shared only among practitioners means that many natural scientists do not realize how interesting and lifelike digital organisms are and how natural their evolution can be. To our knowledge, no collection of such anecdotes has been published before. This paper is the crowd-sourced product of researchers in the fields of artificial life and evolutionary computation who have provided first-hand accounts of such cases. It thus serves as a written, fact-checked collection of scientifically important and even entertaining stories. In doing so we also present here substantial evidence that the existence and importance of evolutionary surprises extends beyond the natural world, and may indeed be a universal property of all complex evolving systems.
△ Less
Submitted 21 November, 2019; v1 submitted 9 March, 2018;
originally announced March 2018.
-
Autostacker: A Compositional Evolutionary Learning System
Authors:
Boyuan Chen,
Harvey Wu,
Warren Mo,
Ishanu Chattopadhyay,
Hod Lipson
Abstract:
We introduce an automatic machine learning (AutoML) modeling architecture called Autostacker, which combines an innovative hierarchical stacking architecture and an Evolutionary Algorithm (EA) to perform efficient parameter search. Neither prior domain knowledge about the data nor feature preprocessing is needed. Using EA, Autostacker quickly evolves candidate pipelines with high predictive accura…
▽ More
We introduce an automatic machine learning (AutoML) modeling architecture called Autostacker, which combines an innovative hierarchical stacking architecture and an Evolutionary Algorithm (EA) to perform efficient parameter search. Neither prior domain knowledge about the data nor feature preprocessing is needed. Using EA, Autostacker quickly evolves candidate pipelines with high predictive accuracy. These pipelines can be used as is or as a starting point for human experts to build on. Autostacker finds innovative combinations and structures of machine learning models, rather than selecting a single model and optimizing its hyperparameters. Compared with other AutoML systems on fifteen datasets, Autostacker achieves state-of-art or competitive performance both in terms of test accuracy and time cost.
△ Less
Submitted 1 March, 2018;
originally announced March 2018.
-
Scalable Co-Optimization of Morphology and Control in Embodied Machines
Authors:
Nick Cheney,
Josh Bongard,
Vytas SunSpiral,
Hod Lipson
Abstract:
Evolution sculpts both the body plans and nervous systems of agents together over time. In contrast, in AI and robotics, a robot's body plan is usually designed by hand, and control policies are then optimized for that fixed design. The task of simultaneously co-optimizing the morphology and controller of an embodied robot has remained a challenge. In psychology, the theory of embodied cognition p…
▽ More
Evolution sculpts both the body plans and nervous systems of agents together over time. In contrast, in AI and robotics, a robot's body plan is usually designed by hand, and control policies are then optimized for that fixed design. The task of simultaneously co-optimizing the morphology and controller of an embodied robot has remained a challenge. In psychology, the theory of embodied cognition posits that behavior arises from a close coupling between body plan and sensorimotor control, which suggests why co-optimizing these two subsystems is so difficult: most evolutionary changes to morphology tend to adversely impact sensorimotor control, leading to an overall decrease in behavioral performance. Here, we further examine this hypothesis and demonstrate a technique for "morphological innovation protection", which temporarily reduces selection pressure on recently morphologically-changed individuals, thus enabling evolution some time to "readapt" to the new morphology with subsequent control policy mutations. We show the potential for this method to avoid local optima and converge to similar highly fit morphologies across widely varying initial conditions, while sustaining fitness improvements further into optimization. While this technique is admittedly only the first of many steps that must be taken to achieve scalable optimization of embodied machines, we hope that theoretical insight into the cause of evolutionary stagnation in current methods will help to enable the automation of robot design and behavioral training -- while simultaneously providing a testbed to investigate the theory of embodied cognition.
△ Less
Submitted 12 December, 2017; v1 submitted 19 June, 2017;
originally announced June 2017.
-
Convergent Learning: Do different neural networks learn the same representations?
Authors:
Yixuan Li,
Jason Yosinski,
Jeff Clune,
Hod Lipson,
John Hopcroft
Abstract:
Recent success in training deep neural networks have prompted active investigation into the features learned on their intermediate layers. Such research is difficult because it requires making sense of non-linear computations performed by millions of parameters, but valuable because it increases our ability to understand current models and create improved versions of them. In this paper we investi…
▽ More
Recent success in training deep neural networks have prompted active investigation into the features learned on their intermediate layers. Such research is difficult because it requires making sense of non-linear computations performed by millions of parameters, but valuable because it increases our ability to understand current models and create improved versions of them. In this paper we investigate the extent to which neural networks exhibit what we call convergent learning, which is when the representations learned by multiple nets converge to a set of features which are either individually similar between networks or where subsets of features span similar low-dimensional spaces. We propose a specific method of probing representations: training multiple networks and then comparing and contrasting their individual, learned representations at the level of neurons or groups of neurons. We begin research into this question using three techniques to approximately align different neural networks on a feature level: a bipartite matching approach that makes one-to-one assignments between neurons, a sparse prediction approach that finds one-to-many mappings, and a spectral clustering approach that finds many-to-many mappings. This initial investigation reveals a few previously unknown properties of neural networks, and we argue that future research into the question of convergent learning will yield many more. The insights described here include (1) that some features are learned reliably in multiple networks, yet other features are not consistently learned; (2) that units learn to span low-dimensional subspaces and, while these subspaces are common to multiple networks, the specific basis vectors learned are not; (3) that the representation codes show evidence of being a mix between a local code and slightly, but not fully, distributed codes across multiple units.
△ Less
Submitted 28 February, 2016; v1 submitted 23 November, 2015;
originally announced November 2015.
-
Understanding Neural Networks Through Deep Visualization
Authors:
Jason Yosinski,
Jeff Clune,
Anh Nguyen,
Thomas Fuchs,
Hod Lipson
Abstract:
Recent years have produced great advances in training large, deep neural networks (DNNs), including notable successes in training convolutional neural networks (convnets) to recognize natural images. However, our understanding of how these models work, especially what computations they perform at intermediate layers, has lagged behind. Progress in the field will be further accelerated by the devel…
▽ More
Recent years have produced great advances in training large, deep neural networks (DNNs), including notable successes in training convolutional neural networks (convnets) to recognize natural images. However, our understanding of how these models work, especially what computations they perform at intermediate layers, has lagged behind. Progress in the field will be further accelerated by the development of better tools for visualizing and interpreting neural nets. We introduce two such tools here. The first is a tool that visualizes the activations produced on each layer of a trained convnet as it processes an image or video (e.g. a live webcam stream). We have found that looking at live activations that change in response to user input helps build valuable intuitions about how convnets work. The second tool enables visualizing features at each layer of a DNN via regularized optimization in image space. Because previous versions of this idea produced less recognizable images, here we introduce several new regularization methods that combine to produce qualitatively clearer, more interpretable visualizations. Both tools are open source and work on a pre-trained convnet with minimal setup.
△ Less
Submitted 22 June, 2015;
originally announced June 2015.
-
How transferable are features in deep neural networks?
Authors:
Jason Yosinski,
Jeff Clune,
Yoshua Bengio,
Hod Lipson
Abstract:
Many deep neural networks trained on natural images exhibit a curious phenomenon in common: on the first layer they learn features similar to Gabor filters and color blobs. Such first-layer features appear not to be specific to a particular dataset or task, but general in that they are applicable to many datasets and tasks. Features must eventually transition from general to specific by the last l…
▽ More
Many deep neural networks trained on natural images exhibit a curious phenomenon in common: on the first layer they learn features similar to Gabor filters and color blobs. Such first-layer features appear not to be specific to a particular dataset or task, but general in that they are applicable to many datasets and tasks. Features must eventually transition from general to specific by the last layer of the network, but this transition has not been studied extensively. In this paper we experimentally quantify the generality versus specificity of neurons in each layer of a deep convolutional neural network and report a few surprising results. Transferability is negatively affected by two distinct issues: (1) the specialization of higher layer neurons to their original task at the expense of performance on the target task, which was expected, and (2) optimization difficulties related to splitting networks between co-adapted neurons, which was not expected. In an example network trained on ImageNet, we demonstrate that either of these two issues may dominate, depending on whether features are transferred from the bottom, middle, or top of the network. We also document that the transferability of features decreases as the distance between the base task and target task increases, but that transferring features even from distant tasks can be better than using random features. A final surprising result is that initializing a network with transferred features from almost any number of layers can produce a boost to generalization that lingers even after fine-tuning to the target dataset.
△ Less
Submitted 6 November, 2014;
originally announced November 2014.
-
Data Smashing
Authors:
Ishanu Chattopadhyay,
Hod Lipson
Abstract:
Investigation of the underlying physics or biology from empirical data requires a quantifiable notion of similarity - when do two observed data sets indicate nearly identical generating processes, and when they do not. The discriminating characteristics to look for in data is often determined by heuristics designed by experts, $e.g.$, distinct shapes of "folded" lightcurves may be used as "feature…
▽ More
Investigation of the underlying physics or biology from empirical data requires a quantifiable notion of similarity - when do two observed data sets indicate nearly identical generating processes, and when they do not. The discriminating characteristics to look for in data is often determined by heuristics designed by experts, $e.g.$, distinct shapes of "folded" lightcurves may be used as "features" to classify variable stars, while determination of pathological brain states might require a Fourier analysis of brainwave activity. Finding good features is non-trivial. Here, we propose a universal solution to this problem: we delineate a principle for quantifying similarity between sources of arbitrary data streams, without a priori knowledge, features or training. We uncover an algebraic structure on a space of symbolic models for quantized data, and show that such stochastic generators may be added and uniquely inverted; and that a model and its inverse always sum to the generator of flat white noise. Therefore, every data stream has an anti-stream: data generated by the inverse model. Similarity between two streams, then, is the degree to which one, when summed to the other's anti-stream, mutually annihilates all statistical structure to noise. We call this data smashing. We present diverse applications, including disambiguation of brainwaves pertaining to epileptic seizures, detection of anomalous cardiac rhythms, and classification of astronomical objects from raw photometry. In our examples, the data smashing principle, without access to any domain knowledge, meets or exceeds the performance of specialized algorithms tuned by domain experts.
△ Less
Submitted 3 January, 2014;
originally announced January 2014.
-
Computing Entropy Rate Of Symbol Sources & A Distribution-free Limit Theorem
Authors:
Ishanu Chattopadhyay,
Hod Lipson
Abstract:
Entropy rate of sequential data-streams naturally quantifies the complexity of the generative process. Thus entropy rate fluctuations could be used as a tool to recognize dynamical perturbations in signal sources, and could potentially be carried out without explicit background noise characterization. However, state of the art algorithms to estimate the entropy rate have markedly slow convergence;…
▽ More
Entropy rate of sequential data-streams naturally quantifies the complexity of the generative process. Thus entropy rate fluctuations could be used as a tool to recognize dynamical perturbations in signal sources, and could potentially be carried out without explicit background noise characterization. However, state of the art algorithms to estimate the entropy rate have markedly slow convergence; making such entropic approaches non-viable in practice. We present here a fundamentally new approach to estimate entropy rates, which is demonstrated to converge significantly faster in terms of input data lengths, and is shown to be effective in diverse applications ranging from the estimation of the entropy rate of English texts to the estimation of complexity of chaotic dynamical systems. Additionally, the convergence rate of entropy estimates do not follow from any standard limit theorem, and reported algorithms fail to provide any confidence bounds on the computed values. Exploiting a connection to the theory of probabilistic automata, we establish a convergence rate of $O(\log \vert s \vert/\sqrt[3]{\vert s \vert})$ as a function of the input length $\vert s \vert$, which then yields explicit uncertainty estimates, as well as required data lengths to satisfy pre-specified confidence bounds.
△ Less
Submitted 21 March, 2014; v1 submitted 3 January, 2014;
originally announced January 2014.
-
Particle Shape Effects on the Stress Response of Granular Packings
Authors:
Athanasios G. Athanassiadis,
Marc Z. Miskin,
Paul Kaplan,
Nicholas Rodenberg,
Seung Hwan Lee,
Jason Merritt,
Eric Brown,
John Amend,
Hod Lipson,
Heinrich M. Jaeger
Abstract:
We present measurements of the stress response of packings formed from a wide range of particle shapes. Besides spheres these include convex shapes such as the Platonic solids, truncated tetrahedra, and triangular bipyramids, as well as more complex, non-convex geometries such as hexapods with various arm lengths, dolos, and tetrahedral frames. All particles were 3D-printed in hard resin. Well-def…
▽ More
We present measurements of the stress response of packings formed from a wide range of particle shapes. Besides spheres these include convex shapes such as the Platonic solids, truncated tetrahedra, and triangular bipyramids, as well as more complex, non-convex geometries such as hexapods with various arm lengths, dolos, and tetrahedral frames. All particles were 3D-printed in hard resin. Well-defined initial packing states were established through preconditioning by cyclic loading under given confinement pressure. Starting from such initial states, stress-strain relationships for axial compression were obtained at four different confining pressures for each particle type. While confining pressure has the largest overall effect on the mechanical response, we find that particle shape controls the details of the stress-strain curves and can be used to tune packing stiffness and yielding. By correlating the experimentally measured values for the effective Young's modulus under compression, yield stress and energy loss during cyclic loading, we identify trends among the various shapes that allow for designing a packing's aggregate behavior.
△ Less
Submitted 15 October, 2013; v1 submitted 9 August, 2013;
originally announced August 2013.
-
Hands-free Evolution of 3D-printable Objects via Eye Tracking
Authors:
Nick Cheney,
Jeff Clune,
Jason Yosinski,
Hod Lipson
Abstract:
Interactive evolution has shown the potential to create amazing and complex forms in both 2-D and 3-D settings. However, the algorithm is slow and users quickly become fatigued. We propose that the use of eye tracking for interactive evolution systems will both reduce user fatigue and improve evolutionary success. We describe a systematic method for testing the hypothesis that eye tracking driven…
▽ More
Interactive evolution has shown the potential to create amazing and complex forms in both 2-D and 3-D settings. However, the algorithm is slow and users quickly become fatigued. We propose that the use of eye tracking for interactive evolution systems will both reduce user fatigue and improve evolutionary success. We describe a systematic method for testing the hypothesis that eye tracking driven interactive evolution will be a more successful and easier-to-use design method than traditional interactive evolution methods driven by mouse clicks. We provide preliminary results that support the possibility of this proposal, and lay out future work to investigate these advantages in extensive clinical trials.
△ Less
Submitted 19 April, 2013; v1 submitted 17 April, 2013;
originally announced April 2013.