×

Adaptive path-integral autoencoder: representation learning and planning for dynamical systems. (English) Zbl 1459.62164

Summary: We present a representation learning algorithm that learns a low-dimensional latent dynamical system from high-dimensional sequential raw data, e.g. video. The framework builds upon recent advances in amortized inference methods that use both an inference network and a refinement procedure to output samples from a variational distribution given an observation sequence, and takes advantage of the duality between control and inference to approximately solve the intractable inference problem using the path integral control approach. The learned dynamical model can be used to predict and plan the future states; we also present the efficient planning method that exploits the learned low-dimensional latent dynamics. Numerical experiments show that the proposed path-integral control based variational inference method leads to tighter lower bounds in statistical model learning of sequential data. The supplementary video (https://youtu.be/xCp35crUoLQ) and the implementation code (https://github.com/yjparkLiCS/18-NIPS-APIAE) are available online.

MSC:

62M09 Non-Markovian processes: estimation
62L12 Sequential estimation
62-04 Software, source code, etc. for problems pertaining to statistics
94A12 Signal theory (characterization, reconstruction, filtering, etc.)

Software:

GitHub; TensorFlow

References:

[1] Abadi M et al 2016 Tensorflow: a system for large-scale machine learning OSDIvol 16 pp 265-83
[2] Banijamali E, Shu R, Ghavamzadeh M, Bui H and Ghodsi A 2018 Robust locally-linear controllable embedding Int. Conf. Artificial Intelligence and Statistics
[3] Bellman R 1966 Dynamic Programming Science153 33-7 · doi:10.1126/science.153.3731.34
[4] Burda Y, Grosse R and Salakhutdinov R 2016 Importance weighted autoencoders Int. Conf. on Learning Representations
[5] Chen N, Karl M and van der Smagt P 2016 Dynamic movement primitives in latent space of time-dependent variational autoencoders Int. Conf. on Humanoid Robots pp 629-36
[6] Cremer C, Morris Q and Duvenaud D 2017 Reinterpreting importance-weighted autoencoders ICLR Workshop
[7] Cremer C, Li X and Duvenaud D 2018 Inference suboptimality in variational autoencoders (arXiv:1801.03558)
[8] Fraccaro M, Kamronn S, Paquet U and Winther O 2017 A disentangled recognition and nonlinear dynamics model for unsupervised learning Advances in Neural Information Processing Systems pp 3604-13
[9] Genewein T, Leibfried F, Grau-Moya J and Braun D A 2015 Bounded rationality, abstraction, and hierarchical decision-making: an information-theoretic optimality principle Frontiers Robot. AI 2 27 · doi:10.3389/frobt.2015.00027
[10] Ha J-S, Chae H-J and Choi H-L 2018 Approximate inference-based motion planning by learning and exploiting low-dimensional latent variable models pp 3892-9 · doi:10.1109/LRA.2018.2856915
[11] Hjelm D, Salakhutdinov R R, Cho K, Jojic N, Calhoun V and Chung J 2016 Iterative refinement of the approximate posterior for directed belief networks Advances in Neural Information Processing Systems pp 4691-9
[12] Ichter B, Harrison J and Pavone M 2018 Learning sampling distributions for robot motion planning Int. Conf. on Robotics and Automation(ICRA,) · doi:10.1109/ICRA.2018.8460730
[13] Johnson M, Duvenaud D K, Wiltschko A, Adams R P and Datta S R 2016 Composing graphical models with neural networks for structured representations and fast inference Advances in Neural Information Processing Systems (NIPS) pp 2946-54
[14] Jonschkowski R and Brock O 2015 Learning state representations with robotic priors Auton. Robots39 407-28 · doi:10.1007/s10514-015-9459-7
[15] Kappen H J and Ruiz H C 2016 Adaptive importance sampling for control and inference J. Stat. Phys.162 1244-66 · Zbl 1338.93166 · doi:10.1007/s10955-016-1446-7
[16] Karkus P, Hsu D and Lee W S 2017 Qmdp-net: deep learning for planning under partial observability Advances in Neural Information Processing Systems pp 4697-707
[17] Karl M, Soelch M, Bayer J and van der Smagt P 2017 Deep variational bayes filters: Unsupervised learning of state space models from raw data Int. Conf. on Learning Representations
[18] Kim Y, Wiseman S, Miller A C, Sontag D and Rush A M 2018 Semi-amortized variational autoencoders (arXiv:1802.02550)
[19] Kingma D P and Welling M 2014 Auto-encoding variational bayes Int. Conf. on Learning Representations
[20] Krishnan R G, Shalit U and Sontag D 2017 Structured inference networks for nonlinear state space models AAAI pp 2101-9
[21] Krishnan R G, Liang D and Hoffman M 2018 On the challenges of learning with inference networks on sparse, high-dimensional data Int. Conf. on Artificial Intelligence and Statistics
[22] Le T A, Igl M, Jin T, Rainforth T and Wood F 2018 Auto-encoding sequential Monte Carlo Int. Conf. on Learning Representations
[23] Lesort T, Díaz-Rodríguez N, Goudou J-F and Filliat D 2018 State representation learning for control: an overview Neural Networks108 379-92 · doi:10.1016/j.neunet.2018.07.006
[24] Maddison C J, Lawson D, Tucker G, Heess N, Norouzi M, Mnih A, Doucet A and Teh Y W 2017 Filtering variational objectives Advances in neural information processing systems
[25] Mnih A and Rezende D 2016 Variational inference for monte carlo objectives Int. Conf. on Machine Learning pp 2188-96
[26] Naesseth C A, Linderman S W, Ranganath R and Blei D M 2018 Variational sequential Monte Carlo Int. Conf. on Artificial Intelligence and Statistics
[27] Okada M, Rigazio L and Aoshima T 2017 Path integral networks: end-to-end differentiable optimal control (arXiv:1706.09597)
[28] Rainforth T, Kosiorek A R, Le T A, Maddison C J, Igl M, Wood F and Teh Y W 2018 Tighter variational bounds are not necessarily better Int. Conf. on Machine Learning
[29] Rezende D J, Mohamed S and Wierstra D 2014 Stochastic backpropagation and approximate inference in deep generative models Int. Conf. on Machine Learning pp 1278-86
[30] Ruiz H-C and Kappen H J 2017 Particle smoothing for hidden diffusion processes: Adaptive path integral smoother IEEE Trans. Signal Process.65 3191-203 · Zbl 1414.94525 · doi:10.1109/TSP.2017.2686340
[31] Tamar A, Wu Y, Thomas G, Levine S and Abbeel P 2016 Value iteration networks Advances in Neural Information Processing Systems pp 2154-62
[32] Tassa Y et al 2018 DeepMind control suite (arXiv:1801.00690)
[33] Thijssen S and Kappen H J 2015 Path integral control and state-dependent feedback Phys. Rev. E 91 032104 · doi:10.1103/PhysRevE.91.032104
[34] Todorov E 2008 General duality between optimal control and estimation IEEE Conf. Decision and Control pp 4286-92
[35] Todorov E 2009 Efficient computation of optimal actions Proc. Natl Acad. Sci.106 11478-83 · Zbl 1203.68327 · doi:10.1073/pnas.0710743106
[36] Vernaza P and Lee D D 2012 Learning and exploiting low-dimensional structure for efficient holonomic motion planning in high-dimensional spaces Int. J. Robot. Res.31 1739-60 · doi:10.1177/0278364912457436
[37] Wang J M, Fleet D J and Hertzmann A 2008 Gaussian process dynamical models for human motion IEEE Trans. Pattern Anal. Mach. Intell.30 283-98 · doi:10.1109/TPAMI.2007.1167
[38] Watter M, Springenberg J, Boedecker J and Riedmiller M 2015 Embed to control: a locally linear latent dynamics model for control from raw images Advances in Neural Information Processing Systems pp 2746-54
[39] Zhang C, Huh J and Lee D D 2018 Learning implicit sampling distributions for motion planning (arXiv:1806.01968)
[40] Ziebart B D, Maas A L, Bagnell J A and Dey A K 2008 Maximum entropy inverse reinforcement learning (AAAI vol 8) pp 1433-8
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.