Document Zbl 1525.68107

Tutorial on amortized optimization. (English) Zbl 1525.68107

Found. Trends Mach. Learn. 16, No. 5, 592-732 (2023).

Summary: Optimization is a ubiquitous modeling tool and is often deployed in settings which repeatedly solve similar instances of the same problem. Amortized optimization methods use learning to predict the solutions to problems in these settings, exploiting the shared structure between similar problem instances. These methods have been crucial in variational inference and reinforcement learning and are capable of solving optimization problems many orders of magnitudes times faster than traditional optimization methods that do not use amortization. This tutorial presents an introduction to the amortized optimization foundations behind these advancements and overviews their applications in variational inference, sparse coding, gradient-based meta-learning, control, reinforcement learning, convex optimization, optimal transport, and deep equilibrium networks. The source code for this tutorial is available at https://github.com/facebookresearch/amortized-optimization-tutorial.

MSC:

68T05	Learning and adaptive systems in artificial intelligence
65K10	Numerical optimization and variational techniques
68T07	Artificial neural networks and deep learning
65-02	Research exposition (monographs, survey articles) pertaining to numerical analysis
68-02	Research exposition (monographs, survey articles) pertaining to computer science

Keywords:

optimization; deep learning; reinforcement learning

Software:

OpenAI Gym; Tensor2Tensor; Anderson; PILCO; PyTorch; OptNet; Torchmeta; U-Net; JAX; ADADELTA; learn2learn; Learn2Hop; MNIST; Adam; Julia; JAXopt; AdaGrad; CVXPY; OSQP; JuMP; soft-DTW; VeLO; Theano; TensorFlow; functorch; Autograd; PMTK; PDCO

Cite Review PDF

Full Text: DOI arXiv

References:

[1]	Abadi, M., P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al. (2016). “Tensorflow: A system for large-scale machine learning”. In: 12th USENIX symposium on operating systems design and implementation (OSDI 16). 265-283.
[2]	Absil, P.-A., R. Mahony, and R. Sepulchre. (2009). Optimization algo-rithms on matrix manifolds. Princeton University Press. · Zbl 1147.65043
[3]	Adams, R. P. and R. S. Zemel. (2011). “Ranking via sinkhorn propaga-tion”. arXiv preprint arXiv:1106.1925.
[4]	Adler, J., A. Ringh, O. Öktem, and J. Karlsson. (2017). “Learning to solve inverse problems using Wasserstein loss”. ArXiv preprint. abs/1710.10898.
[5]	Agrawal, A., B. Amos, S. T. Barratt, S. P. Boyd, S. Diamond, and J. Z. Kolter. (2019a). “Differentiable Convex Optimization Layers”. In: Advances in Neural Information Processing Systems 32: An-nual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada. 9558-9570. url: https : / / proceedings . neurips . cc / paper / 2019 / hash / 9ce3c52fc54362e22053399d3181c638-Abstract.html.
[6]	Agrawal, A., A. N. Modi, A. Passos, A. Lavoie, A. Agarwal, A. Shankar, I. Ganichev, J. Levenberg, M. Hong, R. Monga, and S. Cai. (2019b). “TensorFlow Eager: A multi-stage, Python-embedded DSL for ma-chine learning”. In: Proceedings of Machine Learning and Systems 2019, MLSys 2019, Stanford, CA, USA, March 31 -April 2, 2019. url: https://proceedings.mlsys.org/book/286.pdf.
[7]	Aho, A. V., R. Sethi, and J. D. Ullman. (1986). “Compilers, principles, techniques”. Addison Wesley.
[8]	Alabi, D., A. T. Kalai, K. Ligett, C. Musco, C. Tzamos, and E. Vitercik. (2019). “Learning to Prune: Speeding up Repeated Computations”. In: Conference on Learning Theory, COLT 2019, 25-28 June 2019, Phoenix, AZ, USA. Vol. 99. Proceedings of Machine Learning Re-search. PMLR. 30-33. url: http://proceedings.mlr.press/v99/ alabi19a.html.
[9]	Ali, A., E. Wong, and J. Z. Kolter. (2017). “A Semismooth Newton Method for Fast, Generic Convex Programming”. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Vol. 70. Proceedings of Machine Learning Research. PMLR. 70-79. url: http://proceedings. mlr.press/v70/ali17a.html.
[10]	Allgower, E. L. and K. Georg. (2012). Numerical continuation methods: An introduction. Vol. 13. Springer Science & Business Media.
[11]	Amos, B. (2019). “Differentiable Optimization-Based Modeling for Ma-chine Learning”. PhD thesis. Carnegie Mellon University.
[12]	Amos, B. (2023). “On amortizing convex conjugates for optimal trans-port”. In: The Eleventh International Conference on Learning Rep-resentations, ICLR.
[13]	Amos, B., S. Cohen, G. Luise, and I. Redko. (2023). “Meta Opti-mal Transport”. In: International Conference on Machine Learning, ICML 2023. Proceedings of Machine Learning Research.
[14]	Amos, B. and J. Z. Kolter. (2017). “OptNet: Differentiable Optimiza-tion as a Layer in Neural Networks”. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Vol. 70. Proceedings of Machine Learning Research. PMLR. 136-145. url: http://proceedings.mlr. press/v70/amos17a.html.
[15]	Amos, B., V. Koltun, and J. Z. Kolter. (2019). “The limited multi-label projection layer”. ArXiv preprint. abs/1906.08707.
[16]	Amos, B., I. D. J. Rodriguez, J. Sacks, B. Boots, and J. Z. Kolter. (2018). “Differentiable MPC for End-to-end Planning and Con-trol”. In: Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada. 8299-8310. url: https : / / proceedings . neurips . cc / paper / 2018 / hash / ba6d843eb4251a4526ce65d1807a9309-Abstract.html.
[17]	Amos, B., S. Stanton, D. Yarats, and A. G. Wilson. (2021). “On the model-based stochastic value gradient for continuous reinforcement learning”. In: Proceedings of the 3rd Annual Conference on Learning for Dynamics and Control, L4DC 2021, 7-8 June 2021, Virtual Event, Switzerland. Vol. 144. Proceedings of Machine Learning Re-search. PMLR. 6-20. url: http://proceedings.mlr.press/v144/ amos21a.html.
[18]	Amos, B., L. Xu, and J. Z. Kolter. (2017). “Input Convex Neural Networks”. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Vol. 70. Proceedings of Machine Learning Research. PMLR. 146-155. url: http://proceedings.mlr.press/v70/amos17b. html.
[19]	Amos, B. and D. Yarats. (2020). “The Differentiable Cross-Entropy Method”. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Vol. 119. Proceedings of Machine Learning Research. PMLR. 291-302. url: http://proceedings.mlr.press/v119/amos20a.html.
[20]	Anderson, D. G. (1965). “Iterative procedures for nonlinear integral equations”. Journal of the ACM (JACM). 12(4): 547-560. · Zbl 0149.11503
[21]	Andrychowicz, M., M. Denil, S. G. Colmenarejo, M. W. Hoffman, D. Pfau, T. Schaul, and N. de Freitas. (2016). “Learning to learn by gradient descent by gradient descent”. In: Advances in Neural Infor-mation Processing Systems 29: Annual Conference on Neural Infor-mation Processing Systems 2016, December 5-10, 2016, Barcelona, Spain. 3981-3989. url: https://proceedings.neurips.cc/paper/2016/ hash/fb87582825f9d28a8d42c5e5e5e8b23d-Abstract.html.
[22]	Antoniou, A., H. Edwards, and A. J. Storkey. (2019). “How to train your MAML”. In: 7th International Conference on Learning Rep-resentations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. url: https://openreview.net/forum?id=HJGven05Y7.
[23]	Arbel, M. and J. Mairal. (2022). “Amortized Implicit Differentiation for Stochastic Bilevel Optimization”. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. url: https : / / openreview . net / forum ? id = 3PN4iyXBeF.
[24]	Arjovsky, M., S. Chintala, and L. Bottou. (2017). “Wasserstein Genera-tive Adversarial Networks”. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Aus-tralia, 6-11 August 2017. Vol. 70. Proceedings of Machine Learning Research. PMLR. 214-223. url: http://proceedings.mlr.press/v70/ arjovsky17a.html.
[25]	Arnold, S. M., P. Mahajan, D. Datta, I. Bunner, and K. S. Zarkias. (2020). “learn2learn: A library for meta-learning research”. ArXiv preprint. abs/2008.12284.
[26]	Ba, J. L., J. R. Kiros, and G. E. Hinton. (2016). “Layer Normalization”. arXiv e-prints.
[27]	Bae, J., P. Vicol, J. Z. HaoChen, and R. B. Grosse. (2022). “Amortized proximal optimization”. Advances in Neural Information Processing Systems. 35: 8982-8997.
[28]	Bai, S., J. Z. Kolter, and V. Koltun. (2019). “Deep Equilibrium Mod-els”. In: Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada. 688-699. url: https : / / proceedings . neurips . cc / paper / 2019 / hash / 01386bd6d8e091c2ab4c7c7de644d37b-Abstract.html.
[29]	Bai, S., V. Koltun, and J. Z. Kolter. (2020). “Multiscale Deep Equi-librium Models”. In: Advances in Neural Information Process-ing Systems 33: Annual Conference on Neural Information Pro-cessing Systems 2020, NeurIPS 2020, December 6-12, 2020, vir-tual. url: https : / / proceedings . neurips . cc / paper / 2020 / hash / 3812f9a59b634c2a9c574610eaba5bed-Abstract.html.
[30]	Bai, S., V. Koltun, and J. Z. Kolter. (2022). “Neural Deep Equilibrium Solvers”. In: The Tenth International Conference on Learning Rep-resentations, ICLR 2022, Virtual Event, April 25-29, 2022. url: https://openreview.net/forum?id=B0oHOwT5ENL.
[31]	Baird, L. (1995). “Residual algorithms: Reinforcement learning with function approximation”. In: Machine Learning Proceedings 1995. Elsevier. 30-37.
[32]	Baker, K. (2019). “Learning warm-start points for AC optimal power flow”. In: 2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE. 1-6.
[33]	Balcan, M.-F. (2020). “Data-driven algorithm design”. ArXiv preprint. abs/2011.07177.
[34]	Banerjee, A. G. and N. Roy. (2015). “Efficiently solving repeated integer linear programming problems by learning solutions of similar linear programming problems using boosting trees”. Computer Science and Artificial Intelligence Laboratory Technical Report. MIT.
[35]	Banert, S., A. Ringh, J. Adler, J. Karlsson, and O. Oktem. (2020). “Data-driven nonsmooth optimization”. SIAM Journal on Optimization. 30(1): 102-131. · Zbl 1435.90105
[36]	Banert, S., J. Rudzusika, O. Öktem, and J. Adler. (2021). “Acceler-ated Forward-Backward Optimization using Deep Learning”. ArXiv preprint. abs/2105.05210.
[37]	Bank, B., J. Guddat, D. Klatte, B. Kummer, and K. Tammer. (1982). Non-linear parametric optimization. Springer. · Zbl 0502.49002
[38]	Barratt, S. (2018). “On the differentiability of the solution to convex optimization problems”. ArXiv preprint. abs/1804.05098.
[39]	Baxter, J. (1998). “Theoretical models of learning to learn”. In: Learning to Learn. Springer. 71-94.
[40]	Beck, A. and M. Teboulle. (2009). “A fast iterative shrinkage-thresholding algorithm for linear inverse problems”. SIAM Journal on Imaging Sciences. 2(1): 183-202. · Zbl 1175.94009
[41]	Belanger, D. (2017). “Deep energy-based models for structured predic-tion”. PhD thesis. University of Massachusetts Amherst.
[42]	Belanger, D. and A. McCallum. (2016). “Structured Prediction Energy Networks”. In: Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016. Vol. 48. JMLR Workshop and Conference Proceedings.
[43]	Belanger, D., B. Yang, and A. McCallum. (2017). “End-to-End Learn-ing for Structured Prediction Energy Networks”. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Vol. 70. Proceed-ings of Machine Learning Research. PMLR. 429-439. url: http: //proceedings.mlr.press/v70/belanger17a.html.
[44]	Bellman, R. (1966). “Dynamic programming”. Science. 153(3731): 34-37.
[45]	Bello, I., B. Zoph, V. Vasudevan, and Q. V. Le. (2017). “Neural Op-timizer Search with Reinforcement Learning”. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Vol. 70. Proceed-ings of Machine Learning Research. PMLR. 459-468. url: http: //proceedings.mlr.press/v70/bello17a.html.
[46]	Bengio, S., Y. Bengio, and J. Cloutier. (1994). “Use of genetic program-ming for the search of a new learning rule for neural networks”. In: First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence. IEEE. 324-327.
[47]	Bengio, Y., A. Lodi, and A. Prouvost. (2021). “Machine learning for com-binatorial optimization: a methodological tour d”horizon”. European Journal of Operational Research. 290(2): 405-421. · Zbl 1487.90541
[48]	Bertinetto, L., J. F. Henriques, P. H. S. Torr, and A. Vedaldi. (2019). “Meta-learning with differentiable closed-form solvers”. In: 7th In-ternational Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. url: https://openreview. net/forum?id=HyxnZh0ct7.
[49]	Berto, F., S. Massaroli, M. Poli, and J. Park. (2022). “Neural Solvers for Fast and Accurate Numerical Optimal Control”. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. url: https://openreview.net/ forum?id=m8bypnj7Yl5.
[50]	Bertsekas, D. (2015). Convex optimization algorithms. Athena Scientific. Bertsekas, D. P. (1971). “Control of uncertain systems with a set-membership description of the uncertainty.” PhD thesis. Mas-sachusetts Institute of Technology.
[51]	Bertsekas, D. P. (2000). Dynamic Programming and Optimal Control. 2nd. Athena Scientific.
[52]	Bertsimas, D. and B. Stellato. (2019). “Online mixed-integer optimiza-tion in milliseconds”. ArXiv preprint. abs/1907.02206. · Zbl 07587567
[53]	Bertsimas, D. and B. Stellato. (2021). “The voice of optimization”. Machine Learning. 110(2): 249-277. · Zbl 07432802
[54]	Bezanson, J., A. Edelman, S. Karpinski, and V. B. Shah. (2017). “Julia: A fresh approach to numerical computing”. SIAM Review. 59(1): 65-98. · Zbl 1356.68030
[55]	Bhardwaj, M., B. Boots, and M. Mukadam. (2020). “Differentiable gaus-sian process motion planning”. In: IEEE International Conference on Robotics and Automation (ICRA). IEEE. 10598-10604.
[56]	Blechschmidt, J. and O. G. Ernst. (2021). “Three ways to solve partial differential equations with neural networks-A review”. GAMM-Mitteilungen: e202100006. · Zbl 1530.65137
[57]	Blei, D. M., A. Kucukelbir, and J. D. McAuliffe. (2017). “Variational inference: A review for statisticians”. Journal of the American sta-tistical Association. 112(518): 859-877.
[58]	Blondel, M. (2019). “Structured Prediction with Projection Oracles”. In: Advances in Neural Information Processing Systems 32: An-nual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada. 12145-12156. url: https://proceedings.neurips.cc/paper/2019/ hash/7990ec44fcf3d7a0e5a2add28362213c-Abstract.html.
[59]	Blondel, M., Q. Berthet, M. Cuturi, R. Frostig, S. Hoyer, F. Llinares-López, F. Pedregosa, and J.-P. Vert. (2022). “Efficient and modular implicit differentiation”. Advances in Neural Information Processing Systems. 35: 5230-5242.
[60]	Blondel, M., A. F. T. Martins, and V. Niculae. (2020). “Learning with Fenchel-Young losses”. Journal of Machine Learning Research. 21: 35:1-35:69. url: http://jmlr.org/papers/v21/19-021.html. · Zbl 1498.68225
[61]	Bonnans, J. F. and A. Shapiro. (2013). Perturbation analysis of opti-mization problems. Springer Science & Business Media.
[62]	Boyd, S., N. Parikh, E. Chu, B. Peleato, and J. Eckstein. (2011). “Dis-tributed optimization and statistical learning via the alternating direction method of multipliers”. Foundations and Trends ® in Ma-chine Learning. 3(1): 1-122. · Zbl 1229.90122
[63]	Boyd, S. and L. Vandenberghe. (2004). Convex optimization. Cambridge University Press. · Zbl 1058.90049
[64]	Bradbury, J., R. Frostig, P. Hawkins, M. J. Johnson, C. Leary, D. Maclaurin, and S. Wanderman-Milne. (2020). “JAX: composable transformations of Python+ NumPy programs, 2018”. 4: 16.
[65]	Brockman, G., V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba. (2016). “Openai gym”. ArXiv preprint. abs/1606.01540.
[66]	Broyden, C. G. (1965). “A class of methods for solving nonlinear si-multaneous equations”. Mathematics of Computation. 19(92): 577-593. · Zbl 0131.13905
[67]	Bubeck, S. (2015). “Convex optimization: Algorithms and complexity”. Foundations and Trends ® in Machine Learning. 8(3-4): 231-357. · Zbl 1365.90196
[68]	Bunne, C., A. Krause, and M. Cuturi. (2022). “Supervised training of conditional monge maps”. ArXiv preprint. abs/2206.14262.
[69]	Burgess, C. P., I. Higgins, A. Pal, L. Matthey, N. Watters, G. Desjardins, and A. Lerchner. (2018). “Understanding disentangling in β-VAE”. ArXiv preprint. abs/1804.03599.
[70]	Busseti, E., W. M. Moursi, and S. Boyd. (2019). “Solution refinement at regular points of conic problems”. Computational Optimization and Applications. 74(3): 627-643. · Zbl 1434.90132
[71]	Byravan, A., L. Hasenclever, P. Trochim, M. Mirza, A. D. Ialongo, Y. Tassa, J. T. Springenberg, A. Abdolmaleki, N. Heess, J. Merel, and M. A. Riedmiller. (2022). “Evaluating Model-Based Planning and Planner Amortization for Continuous Control”. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. url: https://openreview.net/ forum?id=SS8F6tFX3-.
[72]	Byravan, A., J. T. Springenberg, A. Abdolmaleki, R. Hafner, M. Neunert, T. Lampe, N. Siegel, N. Heess, and M. Riedmiller. (2019). “Imagined Value Gradients: Model-Based Policy Optimization with Transfer-able Latent Dynamics Models”. ArXiv preprint. abs/1910.04142.
[73]	Camacho, E. F. and C. B. Alba. (2013). Model predictive control. Springer Science & Business Media.
[74]	Cappart, Q., D. Chételat, E. Khalil, A. Lodi, C. Morris, and P. Veličković. (2021). “Combinatorial optimization and reasoning with graph neural networks”. ArXiv preprint. abs/2102.09544.
[75]	Carter, M. (2001). Foundations of mathematical economics. MIT Press. Caruana, R. (1997). “Multitask learning”. Machine Learning. 28(1): 41-75.
[76]	Cauligi, A., P. Culbertson, E. Schmerling, M. Schwager, B. Stellato, and M. Pavone. (2021). “CoCo: Online Mixed-Integer Control via Supervised Learning”. IEEE Robotics and Automation Letters.
[77]	Cauligi, A., P. Culbertson, B. Stellato, D. Bertsimas, M. Schwager, and M. Pavone. (2020). “Learning mixed-integer convex optimization strategies for robot planning and control”. In: IEEE Conference on Decision and Control (CDC). IEEE. 1698-1705.
[78]	Chandak, Y., G. Theocharous, J. Kostas, S. M. Jordan, and P. S. Thomas. (2019). “Learning Action Representations for Reinforce-ment Learning”. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Vol. 97. Proceedings of Machine Learning Re-search. PMLR. 941-950. url: http://proceedings.mlr.press/v97/ chandak19a.html.
[79]	Chang, J. R., C. Li, B. Póczos, and B. V. K. V. Kumar. (2017). “One Network to Solve Them All -Solving Linear Inverse Problems Using Deep Projection Models”. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017. IEEE Computer Society. 5889-5898. doi: 10.1109/ICCV.2017.627. · doi:10.1109/ICCV.2017.627
[80]	Charton, F. (2021). “Linear algebra with transformers”. ArXiv preprint. abs/2112.01898.
[81]	Charton, F., A. Hayat, S. T. McQuade, N. J. Merrill, and B. Pic-coli. (2021). “A deep language model to predict metabolic network equilibria”. ArXiv preprint. abs/2112.03588.
[82]	Chen, J. Y., S. Silwal, A. Vakilian, and F. Zhang. (2022a). “Faster Fundamental Graph Algorithms via Learned Predictions”. In: Inter-national Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Vol. 162. Proceedings of Machine Learning Research. PMLR. 3583-3602. url: https://proceedings. mlr.press/v162/chen22v.html.
[83]	Chen, S. S., D. L. Donoho, and M. A. Saunders. (2001). “Atomic decomposition by basis pursuit”. SIAM Review. 43(1): 129-159. · Zbl 0979.94010
[84]	Chen, S. W., T. Wang, N. Atanasov, V. Kumar, and M. Morari. (2022b). “Large scale model predictive control with neural networks and primal active sets”. Automatica. 135: 109947. · Zbl 1478.93162
[85]	Chen, T., X. Chen, W. Chen, H. Heaton, J. Liu, Z. Wang, and W. Yin. (2021a). “Learning to optimize: A primer and a benchmark”. ArXiv preprint. abs/2103.12828.
[86]	Chen, T., B. Xu, C. Zhang, and C. Guestrin. (2016). “Training deep nets with sublinear memory cost”. ArXiv preprint. abs/1604.06174.
[87]	Chen, Y., B. Hosseini, H. Owhadi, and A. M. Stuart. (2021b). “Solv-ing and learning nonlinear PDEs with gaussian processes”. ArXiv preprint. abs/2103.12959. · Zbl 07516428
[88]	Chen, Y., A. L. Friesen, F. Behbahani, A. Doucet, D. Budden, M. Hoffman, and N. de Freitas. (2020). “Modular Meta-Learning with Shrinkage”. In: Advances in Neural Information Process-ing Systems 33: Annual Conference on Neural Information Pro-cessing Systems 2020, NeurIPS 2020, December 6-12, 2020, vir-tual. url: https : / / proceedings . neurips . cc / paper / 2020 / hash / 1e04b969bf040acd252e1faafb51f829-Abstract.html.
[89]	Chen, Y., M. W. Hoffman, S. G. Colmenarejo, M. Denil, T. P. Lillicrap, M. Botvinick, and N. de Freitas. (2017). “Learning to Learn without Gradient Descent by Gradient Descent”. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Vol. 70. Proceedings of Machine Learning Research. PMLR. 748-756. url: http://proceedings.mlr. press/v70/chen17e.html.
[90]	Cho, K., B. van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. (2014). “Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation”. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar: Association for Computational Linguistics. 1724-1734. doi: 10.3115/v1/D14-1179. · doi:10.3115/v1/D14-1179
[91]	Chung, J., K. Kastner, L. Dinh, K. Goel, A. C. Courville, and Y. Bengio. (2015). “A Recurrent Latent Variable Model for Sequen-tial Data”. In: Advances in Neural Information Processing Sys-tems 28: Annual Conference on Neural Information Processing Sys-tems 2015, December 7-12, 2015, Montreal, Quebec, Canada. 2980-2988. url: https : / / proceedings . neurips . cc / paper / 2015 / hash / b618c3210e934362ac261db280128c22-Abstract.html.
[92]	Cohen, S., B. Amos, and Y. Lipman. (2021). “Riemannian Convex Potential Maps”. In: Proceedings of the 38th International Confer-ence on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event. Vol. 139. Proceedings of Machine Learning Research. PMLR. 2028-2038. url: http://proceedings.mlr.press/v139/cohen21a.html.
[93]	Cover, T. M. and J. A. Thomas. (2006). “Elements of Information Theory”. Wiley Series in Telecommunications and Signal Processing. · Zbl 1140.94001
[94]	Cremer, C., X. Li, and D. Duvenaud. (2018). “Inference Suboptimality in Variational Autoencoders”. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Vol. 80. Proceedings of Machine Learning Research. PMLR. 1086-1094. url: http://proceedings.mlr. press/v80/cremer18a.html.
[95]	Cruz, R. S., B. Fernando, A. Cherian, and S. Gould. (2017). “Deep-PermNet: Visual Permutation Learning”. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Hon-olulu, HI, USA, July 21-26, 2017. IEEE Computer Society. 6044-6052. doi: 10.1109/CVPR.2017.640. · doi:10.1109/CVPR.2017.640
[96]	Cuturi, M. (2013). “Sinkhorn Distances: Lightspeed Computation of Optimal Transport”. In: Advances in Neural Information Pro-cessing Systems 26: 27th Annual Conference on Neural Infor-mation Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States. 2292-2300. url: https : / / proceedings . neurips . cc / paper / 2013 / hash / af21d0c97db2e27e13572cbf59eb343d-Abstract.html.
[97]	Cuturi, M. and M. Blondel. (2017). “Soft-DTW: a Differentiable Loss Function for Time-Series”. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Aus-tralia, 6-11 August 2017. Vol. 70. Proceedings of Machine Learning Research. PMLR. 894-903. url: http://proceedings.mlr.press/v70/ cuturi17a.html.
[98]	d’Ascoli, S., P.-A. Kamienny, G. Lample, and F. Charton. (2022). “Deep Symbolic Regression for Recurrent Sequences”. arXiv: 2201.04600 [cs.LG].
[99]	Dam, N., Q. Hoang, T. Le, T. D. Nguyen, H. Bui, and D. Phung. (2019). “Three-Player Wasserstein GAN via Amortised Duality”. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019. 2202-2208. doi: 10.24963/ijcai.2019/305. · doi:10.24963/ijcai.2019/305
[100]	Danskin, J. M. (1966). “The theory of max-min, with applications”. SIAM Journal on Applied Mathematics. 14(4): 641-664. · Zbl 0144.43301
[101]	Daubechies, I., M. Defrise, and C. De Mol. (2004). “An iterative thresh-olding algorithm for linear inverse problems with a sparsity con-straint”. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences. 57(11): 1413-1457. · Zbl 1077.65055
[102]	Davidson, J. W. and S. Jinturkar. (1995). “An aggressive approach to loop unrolling”. Tech. rep. Citeseer.
[103]	Dayan, P., G. E. Hinton, R. M. Neal, and R. S. Zemel. (1995). “The helmholtz machine”. Neural Computation. 7(5): 889-904.
[104]	De Boer, P.-T., D. P. Kroese, S. Mannor, and R. Y. Rubinstein. (2005). “A tutorial on the cross-entropy method”. Annals of Operations Research. 134(1): 19-67. · Zbl 1075.90066
[105]	Deisenroth, M. P., A. A. Faisal, and C. S. Ong. (2020). Mathematics for machine learning. Cambridge University Press. · Zbl 1491.68002
[106]	Deisenroth, M. P. and C. E. Rasmussen. (2011). “PILCO: A Model-Based and Data-Efficient Approach to Policy Search”. In: Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28 -July 2, 2011. Omnipress. 465-472. url: https://icml.cc/2011/papers/323
[107]	Deleu, T., T. Würfl, M. Samiei, J. P. Cohen, and Y. Bengio. (2019). “Torchmeta: A meta-learning library for pytorch”. ArXiv preprint. abs/1909.06576.
[108]	Deshpande, I., Y. Hu, R. Sun, A. Pyrros, N. Siddiqui, S. Koyejo, Z. Zhao, D. A. Forsyth, and A. G. Schwing. (2019). “Max-Sliced Wasserstein Distance and Its Use for GANs”. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. Computer Vision Foundation / IEEE. 10648-10656. doi: 10.1109/CVPR.2019.01090. · doi:10.1109/CVPR.2019.01090
[109]	Diamond, S. and S. Boyd. (2016). “CVXPY: A Python-embedded modeling language for convex optimization”. The Journal of Machine Learning Research. 17(1): 2909-2913. · Zbl 1360.90008
[110]	Dini, U. (1878). Analisi infinitesimale. Lithografia Gorani.
[111]	Dinitz, M., S. Im, T. Lavastida, B. Moseley, and S. Vassilvitskii. (2021). “Faster Matchings via Learned Duals”. In: Advances in Neural In-formation Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual. 10393-10406. url: https://proceedings.neurips.cc/ paper/2021/hash/5616060fb8ae85d93f334e7267307664-Abstract. html.
[112]	Doersch, C. (2016). “Tutorial on variational autoencoders”. ArXiv preprint. abs/1606.05908.
[113]	Domke, J. (2012). “Generic methods for optimization-based modeling”. In: AISTATS. 318-326.
[114]	Dong, W., Z. Xie, G. Kestor, and D. Li. (2020). “Smart-PGSim: using neural network to accelerate AC-OPF power grid simulation”. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE. 1-15.
[115]	Donoho, D. L. and M. Elad. (2003). “Optimally sparse representa-tion in general (nonorthogonal) dictionaries via ℓ1 minimization”. Proceedings of the National Academy of Sciences. 100(5): 2197-2202. · Zbl 1064.94011
[116]	Dontchev, A. L. and R. T. Rockafellar. (2009). Implicit functions and solution mappings. Springer. · Zbl 1178.26001
[117]	Donti, P. L., D. Rolnick, and J. Z. Kolter. (2021). “DC3: A learning method for optimization with hard constraints”. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. url: https://openreview.net/forum?id= V1ZHVxJ6dSS.
[118]	Drori, I., S. Tran, R. Wang, N. Cheng, K. Liu, L. Tang, E. Ke, N. Singh, T. L. Patti, J. Lynch, A. Shporer, N. Verma, E. Wu, and G. Strang. (2021). “A Neural Network Solves and Generates Mathematics Prob-lems by Program Synthesis: Calculus, Differential Equations, Linear Algebra, and More”. arXiv: 2112.15594 [cs.LG].
[119]	Duchi, J. C., E. Hazan, and Y. Singer. (2010). “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization”. In: COLT 2010 -The 23rd Conference on Learning Theory, Haifa, Israel, June 27-29, 2010. Omnipress. 257-269. url: http://colt2010.haifa. il.ibm.com/papers/COLT2010proceedings.pdf
[120]	Dunning, I., J. Huchette, and M. Lubin. (2017). “JuMP: A modeling language for mathematical optimization”. SIAM Review. 59(2): 295-320. · Zbl 1368.90002
[121]	Dupont, E. (2018). “Learning Disentangled Joint Continuous and Dis-crete Representations”. In: Advances in Neural Information Process-ing Systems 31: Annual Conference on Neural Information Process-ing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada. 708-718. url: https://proceedings.neurips.cc/paper/2018/ hash/b9228e0962a78b84f3d5d92f4faa000b-Abstract.html.
[122]	Duruisseaux, V. and M. Leok. (2022). “Accelerated Optimization on Riemannian Manifolds via Projected Variational Integrators”. arXiv: 2201.02904 [math.OC]. · Zbl 1493.90225
[123]	Ernst, D., P. Geurts, and L. Wehenkel. (2005). “Tree-based batch mode reinforcement learning”. Journal of Machine Learning Research. 6. · Zbl 1222.68193
[124]	Fiacco, A. V. (2020). Mathematical programming with data perturbations. CRC Press.
[125]	Fiacco, A. V. and Y. Ishizuka. (1990). “Sensitivity and stability analysis for nonlinear programming”. Annals of Operations Research. 27(1): 215-235. · Zbl 0718.90086
[126]	Fickinger, A., H. Hu, B. Amos, S. J. Russell, and N. Brown. (2021). “Scalable Online Planning via Reinforcement Learning Fine-Tuning”. In: Advances in Neural Information Processing Sys-tems 34: Annual Conference on Neural Information Processing Sys-tems 2021, NeurIPS 2021, December 6-14, 2021, virtual. 16951-16963. url: https://proceedings.neurips.cc/paper/2021/hash/ 8ce8b102d40392a688f8c04b3cd6cae0-Abstract.html.
[127]	Finn, C., P. Abbeel, and S. Levine. (2017). “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks”. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Vol. 70. Proceedings of Machine Learning Research. PMLR. 1126-1135. url: http:// proceedings.mlr.press/v70/finn17a.html.
[128]	Fleiss, J. (1993). “Review papers: The statistical basis of meta-analysis”. Statistical Methods in Medical Research. 2(2): 121-145.
[129]	Foerster, J. N., R. Y. Chen, M. Al-Shedivat, S. Whiteson, P. Abbeel, and I. Mordatch. (2017). “Learning with opponent-learning awareness”. ArXiv preprint. abs/1709.04326.
[130]	Franceschi, L., M. Donini, P. Frasconi, and M. Pontil. (2017). “Forward and Reverse Gradient-Based Hyperparameter Optimization”. In: Proceedings of the 34th International Conference on Machine Learn-ing, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Vol. 70. Proceedings of Machine Learning Research. PMLR. 1165-1173. url: http://proceedings.mlr.press/v70/franceschi17a.html.
[131]	Franceschi, L., P. Frasconi, S. Salzo, R. Grazzi, and M. Pontil. (2018). “Bilevel Programming for Hyperparameter Optimization and Meta-Learning”. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Vol. 80. Proceedings of Machine Learning Research. PMLR. 1563-1572. url: http://proceedings.mlr.press/ v80/franceschi18a.html.
[132]	Fujimoto, S., H. van Hoof, and D. Meger. (2018). “Addressing Function Approximation Error in Actor-Critic Methods”. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Vol. 80. Proceedings of Machine Learning Research. PMLR. 1582-1591. url: http://proceedings.mlr.press/v80/fujimoto18a.html.
[133]	Fujimoto, S., D. Meger, D. Precup, O. Nachum, and S. S. Gu. (2022). “Why Should I Trust You, Bellman? The Bellman Error is a Poor Re-placement for Value Error”. In: International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Vol. 162. Proceedings of Machine Learning Research. PMLR. 6918-6943. url: https://proceedings.mlr.press/v162/fujimoto22a.html.
[134]	Gao, Z., Y. Wu, Y. Jia, and M. Harandi. (2020). “Learning to Optimize on SPD Manifolds”. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020. IEEE. 7697-7706. doi: 10.1109/CVPR42600.2020. 00772. · doi:10.1109/CVPR42600.2020.00772
[135]	Garcia, J. R., F. Freddi, S. Fotiadis, M. Li, S. Vakili, A. Bernacchia, and G. Hennequin. (2023). “Fisher-Legendre (FishLeg) optimization of deep neural networks”. In: The Eleventh International Conference on Learning Representations, ICLR.
[136]	Garnelo, M., D. Rosenbaum, C. Maddison, T. Ramalho, D. Saxton, M. Shanahan, Y. W. Teh, D. J. Rezende, and S. M. A. Eslami. (2018). “Conditional Neural Processes”. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stock-holmsmässan, Stockholm, Sweden, July 10-15, 2018. Vol. 80. Pro-ceedings of Machine Learning Research. PMLR. 1690-1699. url: http://proceedings.mlr.press/v80/garnelo18a.html.
[137]	Geist, M., B. Piot, and O. Pietquin. (2017). “Is the Bellman resid-ual a bad proxy?” In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA. 3205-3214. url: https : / / proceedings . neurips . cc / paper / 2017 / hash / e0ab531ec312161511493b002f9be2ee-Abstract.html.
[138]	Gershman, S. and N. Goodman. (2014). “Amortized inference in prob-abilistic reasoning”. In: Proceedings of the Annual Meeting of the Cognitive Science Society. Vol. 36.
[139]	Goodfellow, I. J., J. Shlens, and C. Szegedy. (2015). “Explaining and Harnessing Adversarial Examples”. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings. url: http://arxiv. org/abs/1412.6572.
[140]	Gordon, J., J. Bronskill, M. Bauer, S. Nowozin, and R. E. Turner. (2019). “Meta-Learning Probabilistic Inference for Prediction”. In: 7th In-ternational Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. url: https://openreview. net/forum?id=HkxStoC5F7.
[141]	Gould, S., B. Fernando, A. Cherian, P. Anderson, R. S. Cruz, and E. Guo. (2016). “On differentiating parameterized argmin and argmax problems with application to bi-level optimization”. ArXiv preprint. abs/1607.05447.
[142]	Grazzi, R., L. Franceschi, M. Pontil, and S. Salzo. (2020). “On the Iteration Complexity of Hypergradient Computation”. In: Proceed-ings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Vol. 119. Proceed-ings of Machine Learning Research. PMLR. 3748-3758. url: http: //proceedings.mlr.press/v119/grazzi20a.html.
[143]	Grefenstette, E., B. Amos, D. Yarats, P. M. Htut, A. Molchanov, F. Meier, D. Kiela, K. Cho, and S. Chintala. (2019). “Generalized Inner Loop Meta-Learning”. ArXiv preprint. abs/1910.01727.
[144]	Gregor, K. and Y. LeCun. (2010). “Learning Fast Approximations of Sparse Coding”. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), June 21-24, 2010, Haifa, Israel. Omnipress. 399-406. url: https : / / icml . cc / Conferences / 2010 / papers/449.pdf.
[145]	Gruslys, A., R. Munos, I. Danihelka, M. Lanctot, and A. Graves. (2016). “Memory-Efficient Backpropagation Through Time”. In: Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain. 4125-4133. url: https://proceedings.neurips.cc/ paper/2016/hash/a501bebf79d570651ff601788ea9d16d-Abstract. html.
[146]	Grzeszczuk, R., D. Terzopoulos, and G. Hinton. (1998). “Neuroanimator: Fast neural network emulation and control of physics-based models”. In: 25th Annual Conference on Computer Graphics and Interactive Techniques. 9-20.
[147]	Guiasu, S. and A. Shenitzer. (1985). “The principle of maximum en-tropy”. The Mathematical Intelligencer. 7(1): 42-48. · Zbl 0563.94008
[148]	Gurumurthy, S., S. Bai, Z. Manchester, and J. Z. Kolter. (2021). “Joint inference and input optimization in equilibrium networks”. In: Ad-vances in Neural Information Processing Systems 34: Annual Confer-ence on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual. 16818-16832. url: https://proceedings. neurips.cc/paper/2021/hash/8c3c27ac7d298331a1bdfd0a5e8703d3-Abstract.html.
[149]	Ha, D., A. M. Dai, and Q. V. Le. (2017). “HyperNetworks”. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. url: https://openreview.net/forum?id=rkpACe1lx.
[150]	Haarnoja, T., A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu, A. Gupta, P. Abbeel, et al. (2018). “Soft actor-critic algorithms and applications”. ArXiv preprint. abs/1812.05905.
[151]	Habets, P. (2010). “Stabilite Asyptotique Pour des Problemes de Per-turbations Singulieres”. In: Stability Problems. Springer. 2-18.
[152]	Hafner, D., T. P. Lillicrap, J. Ba, and M. Norouzi. (2020). “Dream to Control: Learning Behaviors by Latent Imagination”. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. url: https://openreview. net/forum?id=S1lOTC4tDS.
[153]	Han, T., Y. Lu, S. Zhu, and Y. N. Wu. (2017). “Alternating Back-Propagation for Generator Network”. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA. AAAI Press. 1976-1984. url: http: //aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14784.
[154]	Harlow, H. F. (1949). “The formation of learning sets.” Psychological Review. 56(1): 51.
[155]	Harrison, J., L. Metz, and J. Sohl-Dickstein. (2022). “A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases”. ArXiv preprint. abs/2209.11208.
[156]	He, H. and R. Zou. (2021). “functorch: JAX-like composable function transforms for PyTorch”. Github.
[157]	He, K., X. Zhang, S. Ren, and J. Sun. (2016). “Identity mappings in deep residual networks”. In: ECCV. Springer. 630-645.
[158]	He, S., Y. Li, Y. Feng, S. Ho, S. Ravanbakhsh, W. Chen, and B. Póczos. (2019). “Learning to predict the cosmological structure formation”. Proceedings of the National Academy of Sciences. 116(28): 13825-13832. · Zbl 1431.83191
[159]	Heess, N., G. Wayne, D. Silver, T. P. Lillicrap, T. Erez, and Y. Tassa. (2015). “Learning Continuous Control Policies by Stochastic Value Gradients”. In: Advances in Neural Information Processing Sys-tems 28: Annual Conference on Neural Information Processing Sys-tems 2015, December 7-12, 2015, Montreal, Quebec, Canada. 2944-2952. url: https : / / proceedings . neurips . cc / paper / 2015 / hash / 148510031349642de5ca0c544f31b2ef-Abstract.html.
[160]	Henaff, M., A. Canziani, and Y. LeCun. (2019). “Model-Predictive Pol-icy Learning with Uncertainty Regularization for Driving in Dense Traffic”. In: 7th International Conference on Learning Representa-tions, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. url: https://openreview.net/forum?id=HygQBn0cYm.
[161]	Higgins, I., L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner. (2017). “beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework”. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. url: https://openreview.net/forum?id=Sy2fzU9gl.
[162]	Hochreiter, S. and J. Schmidhuber. (1997). “Long short-term memory”. Neural Computation. 9(8): 1735-1780.
[163]	Hochreiter, S., A. S. Younger, and P. R. Conwell. (2001). “Learning to learn using gradient descent”. In: International Conference on Artificial Neural Networks. Springer. 87-94. · Zbl 1001.68724
[164]	Hoffman, M. D., D. M. Blei, C. Wang, and J. Paisley. (2013). “Stochastic variational inference.” Journal of Machine Learning Research. 14(5). · Zbl 1317.68163
[165]	Hospedales, T., A. Antoniou, P. Micaelli, and A. Storkey. (2020). “Meta-learning in neural networks: A survey”. ArXiv preprint. abs/2004.05439.
[166]	Hoyer, S., J. Sohl-Dickstein, and S. Greydanus. (2019). “Neural repa-rameterization improves structural optimization”. ArXiv preprint. abs/1909.04240.
[167]	Hu, J., X. Liu, Z. Wen, and Y. Yuan. (2019). “A Brief Introduction to Manifold Optimization”. Journal of the Operations Research Society of China. 8: 199-248. · Zbl 1474.49093
[168]	Huang, K., N. D. Sidiropoulos, and A. P. Liavas. (2016). “A flexible and efficient algorithmic framework for constrained matrix and tensor factorization”. IEEE Transactions on Signal Processing. 64(19): 5052-5065. · Zbl 1414.94253
[169]	Huang, T., T. Chen, S. Liu, S. Chang, L. Amini, and Z. Wang. (2022). “Optimizer Amalgamation”. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. url: https://openreview.net/forum?id=VqzXzA9hjaX.
[170]	Huszár, F. (2019). “Notes on iMAML: Meta-Learning with Implicit Gradients”. url: http://inference.vc.
[171]	Ichnowski, J., P. Jain, B. Stellato, G. Banjac, M. Luo, F. Borrelli, J. E. Gonzalez, I. Stoica, and K. Goldberg. (2021). “Accelerating Quadratic Optimization with Reinforcement Learning”. In: Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual. 21043-21055. url: https://proceedings. neurips.cc/paper/2021/hash/afdec7005cc9f14302cd0474fd0f3c96-Abstract.html.
[172]	Jaeger, H. (2002). Tutorial on training recurrent neural networks, cov-ering BPPT, RTRL, EKF and the “echo state network” approach. Vol. 5. GMD-Forschungszentrum Informationstechnik Bonn.
[173]	Jeong, Y. and H. O. Song. (2019). “Learning Discrete and Continuous Factors of Data via Alternating Disentanglement”. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Vol. 97. Proceedings of Machine Learning Research. PMLR. 3091-3099. url: http:// proceedings.mlr.press/v97/jeong19d.html.
[174]	Jordan, M. I., Z. Ghahramani, T. S. Jaakkola, and L. K. Saul. (1999). “An introduction to variational methods for graphical models”. Ma-chine Learning. 37(2): 183-233. · Zbl 0945.68164
[175]	Karniadakis, G. E., I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang. (2021). “Physics-informed machine learning”. Nature Reviews Physics. 3(6): 422-440.
[176]	Kavukcuoglu, K., M. Ranzato, and Y. LeCun. (2010). “Fast inference in sparse coding algorithms with applications to object recognition”. arXiv preprint arXiv:1010.3467.
[177]	Kehoe, E. J. (1988). “A layered network model of associative learning: learning to learn and configuration.” Psychological Review. 95(4): 411.
[178]	Khalil, E. B., H. Dai, Y. Zhang, B. Dilkina, and L. Song. (2017). “Learn-ing Combinatorial Optimization Algorithms over Graphs”. In: Ad-vances in Neural Information Processing Systems 30: Annual Confer-ence on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA. 6348-6358. url: https://proceedings. neurips.cc/paper/2017/hash/d9896106ca98d3d05b8cbdf4fd8b13a1-Abstract.html.
[179]	Khalil, E. B., P. L. Bodic, L. Song, G. L. Nemhauser, and B. Dilkina. (2016). “Learning to Branch in Mixed Integer Programming”. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelli-gence, February 12-17, 2016, Phoenix, Arizona, USA. AAAI Press. 724-731. url: http://www.aaai.org/ocs/index.php/AAAI/AAAI16/ paper/view/12514.
[180]	Khodak, M., M.-F. F. Balcan, A. Talwalkar, and S. Vassilvitskii. (2022). “Learning predictions for algorithms with predictions”. Advances in Neural Information Processing Systems. 35: 3542-3555.
[181]	Kim, Y., S. Wiseman, A. C. Miller, D. A. Sontag, and A. M. Rush. (2018). “Semi-Amortized Variational Autoencoders”. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Vol. 80. Proceedings of Machine Learning Research. PMLR. 2683-2692. url: http://proceedings.mlr.press/v80/kim18e.html.
[182]	Kim, Y. H. (2020). “Deep latent variable models of natural language”. PhD thesis. Harvard University.
[183]	Kingma, D. P. and M. Welling. (2019). “An Introduction to Variational Autoencoders”. Foundations and Trends ® in Machine Learning. 12(4): 307-392. · Zbl 1431.68002
[184]	Kingma, D. P. and J. Ba. (2015). “Adam: A Method for Stochastic Optimization”. In: 3rd International Conference on Learning Rep-resentations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings. url: http://arxiv.org/abs/1412.6980.
[185]	Kingma, D. P. and M. Welling. (2014). “Auto-Encoding Variational Bayes”. In: 2nd International Conference on Learning Representa-tions, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings. url: http://arxiv.org/abs/1312.6114.
[186]	Kirk, D. E. (2004). Optimal control theory: an introduction. Courier Corporation.
[187]	Klatte, D. and B. Kummer. (2006). Nonsmooth equations in optimiza-tion: regularity, calculus, methods and applications. Vol. 60. Springer Science & Business Media.
[188]	Knyazev, B., M. Drozdzal, G. W. Taylor, and A. Romero-Soriano. (2021). “Parameter Prediction for Unseen Deep Architectures”. In: Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual. 29433-29448. url: https://proceedings. neurips.cc/paper/2021/hash/f6185f0ef02dcaec414a3171cd01c697-Abstract.html.
[189]	Konda, V. and J. Tsitsiklis. (1999). “Actor-critic algorithms”. Advances in Neural Information Processing Systems. 12. · Zbl 1049.93095
[190]	Korotin, A., V. Egiazarian, A. Asadulaev, A. Safin, and E. Burnaev. (2021). “Wasserstein-2 Generative Networks”. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. url: https://openreview.net/forum?id= bEoxzW
[191]	Kotary, J., F. Fioretto, P. Van Hentenryck, and B. Wilder. (2021). “End-to-end constrained optimization learning: A survey”. ArXiv preprint. abs/2103.16378.
[192]	Kovachki, N., S. Lanthaler, and S. Mishra. (2021). “On universal ap-proximation and error bounds for fourier neural operators”. The Journal of Machine Learning Research. 22(1): 13237-13312. · Zbl 07626805
[193]	Kriváchy, T., Y. Cai, J. Bowles, D. Cavalcanti, and N. Brunner. (2020). “Fast semidefinite programming with feedforward neural networks”. ArXiv preprint. abs/2011.05785.
[194]	Ladickỳ, L., S. Jeong, B. Solenthaler, M. Pollefeys, and M. Gross. (2015). “Data-driven fluid simulations using regression forests”. ACM Transactions on Graphics (TOG). 34(6): 1-9.
[195]	Lake, B. M., T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman. (2017). “Building machines that learn and think like people”. Behavioral and Brain Sciences. 40.
[196]	Lambert, N., B. Amos, O. Yadan, and R. Calandra. (2020). “Objective mismatch in model-based reinforcement learning”. ArXiv preprint. abs/2002.04523.
[197]	Lample, G. and F. Charton. (2020). “Deep Learning For Symbolic Mathematics”. In: 8th International Conference on Learning Repre-sentations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. url: https://openreview.net/forum?id=S1eZYeHFDS.
[198]	Le, H. M., C. Voloshin, and Y. Yue. (2019). “Batch Policy Learning under Constraints”. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Vol. 97. Proceedings of Machine Learning Research. PMLR. 3703-3712. url: http://proceedings.mlr.press/v97/le19a. html.
[199]	LeCun, Y. (1998). “The MNIST database of handwritten digits”. url: http://yann.lecun.com/exdb/mnist/.
[200]	Lee, J., Y. Lee, J. Kim, A. R. Kosiorek, S. Choi, and Y. W. Teh. (2019a). “Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks”. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Vol. 97. Proceedings of Machine Learning Research. PMLR. 3744-3753. url: http://proceedings.mlr.press/ v97/lee19d.html.
[201]	Lee, K., S. Maji, A. Ravichandran, and S. Soatto. (2019b). “Meta-Learning With Differentiable Convex Optimization”. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. 10657-10665. doi: 10.1109/CVPR.2019.01091. · doi:10.1109/CVPR.2019.01091
[202]	Levine, S. and P. Abbeel. (2014). “Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics”. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada. 1071-1079. url: https://proceedings. neurips.cc/paper/2014/hash/6766aa2750c19aad2fa1b32f36ed4aee-Abstract.html.
[203]	Levine, S., C. Finn, T. Darrell, and P. Abbeel. (2016). “End-to-end training of deep visuomotor policies”. The Journal of Machine Learning Research. 17(1): 1334-1373. · Zbl 1360.68687
[204]	Levine, S. and V. Koltun. (2013). “Guided Policy Search”. In: Pro-ceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013. Vol. 28. JMLR Workshop and Conference Proceedings. JMLR.org. 1-9. url: http: //proceedings.mlr.press/v28/levine13.html.
[205]	Li, K. and J. Malik. (2017a). “Learning to Optimize”. In: 5th Interna-tional Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. url: https: //openreview.net/forum?id=ry4Vrt5gl.
[206]	Li, K. and J. Malik. (2017b). “Learning to optimize neural nets”. ArXiv preprint. abs/1703.00441.
[207]	Li, Z., N. B. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. M. Stuart, and A. Anandkumar. (2021a). “Fourier Neural Oper-ator for Parametric Partial Differential Equations”. In: 9th Interna-tional Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. url: https://openreview.net/forum? id=c8P9NQVtmnO.
[208]	Li, Z., H. Zheng, N. Kovachki, D. Jin, H. Chen, B. Liu, K. Azizzade-nesheli, and A. Anandkumar. (2021b). “Physics-informed neural operator for learning partial differential equations”. ArXiv preprint. abs/2111.03794.
[209]	Liao, R., Y. Xiong, E. Fetaya, L. Zhang, K. Yoon, X. Pitkow, R. Urtasun, and R. S. Zemel. (2018). “Reviving and Improving Recurrent Back-Propagation”. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Vol. 80. Proceedings of Machine Learning Research. PMLR. 3088-3097. url: http://proceedings.mlr.press/ v80/liao18c.html.
[210]	Lillicrap, T. P., J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra. (2016). “Continuous control with deep reinforcement learning”. In: 4th International Conference on Learn-ing Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings. url: http://arxiv.org/abs/ 1509.02971.
[211]	Liu, X., Y. Lu, A. Abbasi, M. Li, J. Mohammadi, and S. Kolouri. (2022). “Teaching Networks to Solve Optimization Problems”. arXiv: 2202.04104 [cs.LG].
[212]	Lodi, A. and G. Zarpellon. (2017). “On learning and branching: a survey”. Top. 25(2): 207-236. · Zbl 1372.90003
[213]	Long, J., E. Shelhamer, and T. Darrell. (2015). “Fully convolutional networks for semantic segmentation”. In: IEEE Conference on Com-puter Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015. IEEE Computer Society. 3431-3440. doi: 10.1109/CVPR.2015.7298965. · doi:10.1109/CVPR.2015.7298965
[214]	Lorraine, J., P. Vicol, and D. Duvenaud. (2020). “Optimizing Millions of Hyperparameters by Implicit Differentiation”. In: The 23rd Interna-tional Conference on Artificial Intelligence and Statistics, AISTATS 2020, 26-28 August 2020, Online [Palermo, Sicily, Italy].
[215]	Vol. 108. Proceedings of Machine Learning Research. PMLR. 1540-1552. url: http://proceedings.mlr.press/v108/lorraine20a.html.
[216]	Lowrey, K., A. Rajeswaran, S. M. Kakade, E. Todorov, and I. Mor-datch. (2019). “Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control”. In: 7th International Con-ference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. url: https://openreview.net/forum?id= Byey7n05FQ.
[217]	Luo, R., F. Tian, T. Qin, E. Chen, and T. Liu. (2018). “Neural Architec-ture Optimization”. In: Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Process-ing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada. 7827-7838. url: https://proceedings.neurips.cc/paper/ 2018/hash/933670f1ac8ba969f32989c312faba75-Abstract.html.
[218]	Lv, K., S. Jiang, and J. Li. (2017). “Learning Gradient Descent: Better Generalization and Longer Horizons”. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Vol. 70. Proceedings of Machine Learning Research. PMLR. 2247-2255. url: http://proceedings.mlr. press/v70/lv17a.html.
[219]	Maclaurin, D. (2016). “Modeling, inference and optimization with com-posable differentiable procedures”. PhD thesis. Harvard University.
[220]	Maclaurin, D., D. Duvenaud, and R. P. Adams. (2015a). “Autograd: Effortless gradients in numpy”. In: ICML 2015 AutoML Workshop. Vol. 238. 5.
[221]	Maclaurin, D., D. Duvenaud, and R. P. Adams. (2015b). “Gradient-based Hyperparameter Optimization through Reversible Learning”. In: Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. Vol. 37. JMLR Workshop and Conference Proceedings. JMLR.org. 2113-2122. url: http://proceedings.mlr.press/v37/maclaurin15.html.
[222]	Maheswaranathan, N., D. Sussillo, L. Metz, R. Sun, and J. Sohl-Dickstein. (2021). “Reverse engineering learned optimizers reveals known and novel mechanisms”. In: Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, vir-tual. 19910-19922. url: https://proceedings.neurips.cc/paper/2021/ hash/a57ecd54d4df7d999bd9c5e3b973ec75-Abstract.html.
[223]	Maillard, O.-A., R. Munos, A. Lazaric, and M. Ghavamzadeh. (2010). “Finite-sample analysis of Bellman residual minimization”. In: 2nd Asian Conference on Machine Learning. JMLR Workshop and Con-ference Proceedings. 299-314.
[224]	Makkuva, A. V., A. Taghvaei, S. Oh, and J. D. Lee. (2020). “Optimal transport mapping via input convex neural networks”. In: Proceed-ings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Vol. 119. Proceed-ings of Machine Learning Research. PMLR. 6672-6681. url: http: //proceedings.mlr.press/v119/makkuva20a.html.
[225]	Marino, J., M. Cvitkovic, and Y. Yue. (2018a). “A General Method for Amortizing Variational Filtering”. In: Advances in Neural In-formation Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada. 7868-7879. url: https://proceedings. neurips.cc/paper/2018/hash/060afc8a563aaccd288f98b7c8723b61-Abstract.html.
[226]	Marino, J., A. Piché, A. D. Ialongo, and Y. Yue. (2021). “Iterative Amortized Policy Optimization”. In: Advances in Neural Informa-tion Processing Systems 34: Annual Conference on Neural Informa-tion Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual. 15667-15681. url: https://proceedings.neurips.cc/paper/ 2021/hash/83fa5a432ae55c253d0e60dbfa716723-Abstract.html.
[227]	Marino, J., Y. Yue, and S. Mandt. (2018b). “Iterative Amortized In-ference”. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Swe-den, July 10-15, 2018. Vol. 80. Proceedings of Machine Learning Research. PMLR. 3400-3409. url: http://proceedings.mlr.press/ v80/marino18a.html.
[228]	Marino, J. L. (2021). “Learned Feedback & Feedforward Perception & Control”. PhD thesis. California Institute of Technology.
[229]	Marwah, T., Z. C. Lipton, and A. Risteski. (2021). “Parametric Complex-ity Bounds for Approximating PDEs with Neural Networks”. In: Ad-vances in Neural Information Processing Systems 34: Annual Confer-ence on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual. 15044-15055. url: https://proceedings. neurips.cc/paper/2021/hash/7edccc661418aeb5761dbcdc06ad490c-Abstract.html.
[230]	Meinhardt, T., M. Möller, C. Hazirbas, and D. Cremers. (2017). “Learn-ing Proximal Operators: Using Denoising Networks for Regularizing Inverse Imaging Problems”. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017. IEEE Computer Society. 1799-1808. doi: 10.1109/ICCV.2017.198. · doi:10.1109/ICCV.2017.198
[231]	Mena, G. E., D. Belanger, S. W. Linderman, and J. Snoek. (2018). “Learning Latent Permutations with Gumbel-Sinkhorn Networks”. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 -May 3, 2018, Conference Track Proceedings. url: https://openreview.net/forum?id=Byt3oJ-0W.
[232]	Merchant, A., L. Metz, S. S. Schoenholz, and E. D. Cubuk. (2021). “Learn2Hop: Learned Optimization on Rough Landscapes”. In: Pro-ceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event. Vol. 139. Proceed-ings of Machine Learning Research. PMLR. 7643-7653. url: http: //proceedings.mlr.press/v139/merchant21a.html.
[233]	Metz, L., C. D. Freeman, S. S. Schoenholz, and T. Kachman. (2021). “Gradients are Not All You Need”. ArXiv preprint. abs/2111.05803. References
[234]	Metz, L., J. Harrison, C. D. Freeman, A. Merchant, L. Beyer, J. Brad-bury, N. Agrawal, B. Poole, I. Mordatch, A. Roberts, et al. (2022). “VeLO: Training Versatile Learned Optimizers by Scaling Up”. ArXiv preprint. abs/2211.09760.
[235]	Metz, L., N. Maheswaranathan, J. Nixon, C. D. Freeman, and J. Sohl-Dickstein. (2019a). “Understanding and correcting pathologies in the training of learned optimizers”. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Vol. 97. Proceedings of Machine Learning Research. PMLR. 4556-4565. url: http:// proceedings.mlr.press/v97/metz19a.html.
[236]	Metz, L., N. Maheswaranathan, J. Shlens, J. Sohl-Dickstein, and E. D. Cubuk. (2019b). “Using learned optimizers to make models robust to input noise”. ArXiv preprint. abs/1906.03367.
[237]	Metz, L., B. Poole, D. Pfau, and J. Sohl-Dickstein. (2017). “Unrolled Generative Adversarial Networks”. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. url: https://openreview. net/forum?id=BydrOIcle.
[238]	Milgrom, P. and I. Segal. (2002). “Envelope theorems for arbitrary choice sets”. Econometrica. 70(2): 583-601. · Zbl 1103.90400
[239]	Misra, S., L. Roald, and Y. Ng. (2021). “Learning for constrained optimization: Identifying optimal active constraint sets”. INFORMS Journal on Computing.
[240]	Mnih, A. and K. Gregor. (2014). “Neural Variational Inference and Learning in Belief Networks”. In: Proceedings of the 31th Inter-national Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014. Vol. 32. JMLR Workshop and Conference Proceedings. JMLR.org. 1791-1799. url: http://proceedings.mlr. press/v32/mnih14.html.
[241]	Mohamed, S., M. Rosca, M. Figurnov, and A. Mnih. (2020). “Monte Carlo Gradient Estimation in Machine Learning”. Journal of Ma-chine Learning Research. 21: 132:1-132:62. url: http://jmlr.org/ papers/v21/19-346.html. · Zbl 1518.62006
[242]	Monga, V., Y. Li, and Y. C. Eldar. (2021). “Algorithm unrolling: In-terpretable, efficient deep learning for signal and image processing”. IEEE Signal Processing Magazine. 38(2): 18-44.
[243]	Montgomery, W. H. and S. Levine. (2016). “Guided Policy Search via Approximate Mirror Descent”. In: Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain. 4008-4016. url: https://proceedings.neurips.cc/paper/2016/hash/ a00e5eb0973d24649a4a920fc53d9564-Abstract.html.
[244]	Murphy, K. P. (2012). Machine learning: a probabilistic perspective. MIT Press. · Zbl 1295.68003
[245]	Nesterov, Y. (1983). “A method for unconstrained convex minimization problem with the rate of convergence O (1/kˆ2)”. In: Doklady an ussr. Vol. 269. 543-547.
[246]	Nesterov, Y. et al. (2018). Lectures on convex optimization. Vol. 137. Springer. · Zbl 1427.90003
[247]	Nguyen, K. and N. Ho. (2022). “Amortized Projection Optimiza-tion for Sliced Wasserstein Generative Models”. ArXiv preprint. abs/2203.13417.
[248]	Nichol, A., J. Achiam, and J. Schulman. (2018). “On first-order meta-learning algorithms”. ArXiv preprint. abs/1803.02999.
[249]	Nocedal, J. and S. Wright. (2006). Numerical optimization. Springer Science & Business Media. · Zbl 1104.65059
[250]	O’Donoghue, B., E. Chu, N. Parikh, and S. Boyd. (2016). “Conic optimization via operator splitting and homogeneous self-dual em-bedding”. Journal of Optimization Theory and Applications. 169(3): 1042-1068. · Zbl 1342.90136
[251]	Olshausen, B. A. and D. J. Field. (1996). “Emergence of simple-cell receptive field properties by learning a sparse code for natural images”. Nature. 381(6583): 607-609.
[252]	Osa, T., J. Pajarinen, G. Neumann, J. A. Bagnell, P. Abbeel, and J. Peters. (2018). “An algorithmic perspective on imitation learning”. Foundations and Trends ® in Robotics. 7(1-2): 1-179.
[253]	Pan, X., M. Chen, T. Zhao, and S. H. Low. (2020). “DeepOPF: A feasibility-optimized deep neural network approach for AC optimal power flow problems”. ArXiv preprint. abs/2007.01002.
[254]	Parikh, N. and S. Boyd. (2014). “Proximal algorithms”. Foundations and Trends ® in Optimization. 1(3): 127-239.
[255]	Parmas, P., C. E. Rasmussen, J. Peters, and K. Doya. (2018). “PIPPS: Flexible Model-Based Policy Search Robust to the Curse of Chaos”. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. Vol. 80. Proceedings of Machine Learning Research. PMLR. 4062-4071. url: http : / / proceedings . mlr . press / v80 / parmas18a.html.
[256]	Parmas, P. and M. Sugiyama. (2021). “A unified view of likelihood ratio and reparameterization gradients”. In: The 24th International Conference on Artificial Intelligence and Statistics, AISTATS 2021, April 13-15, 2021, Virtual Event. Vol. 130. Proceedings of Machine Learning Research. PMLR. 4078-4086. url: http://proceedings.mlr. press/v130/parmas21a.html.
[257]	Pascanu, R., T. Mikolov, and Y. Bengio. (2013). “On the difficulty of training recurrent neural networks”. In: Proceedings of the 30th In-ternational Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013. Vol. 28. JMLR Workshop and Confer-ence Proceedings. JMLR.org. 1310-1318. url: http://proceedings. mlr.press/v28/pascanu13.html.
[258]	Paszke, A., S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. (2019). “PyTorch: An Imperative Style, High-Performance Deep Learning Library”. In: Advances in Neural Information Processing Systems 32: An-nual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada. 8024-8035. url: https : / / proceedings . neurips . cc / paper / 2019 / hash / bdbca288fee7f92f2bfa9f7012727740-Abstract.html.
[259]	Pearlmutter, B. A. (1994). “Fast exact multiplication by the Hessian”. Neural Computation. 6(1): 147-160.
[260]	Pearlmutter, B. A. (1996). “An investigation of the gradient descent process in neural networks”. PhD thesis. Carnegie Mellon University.
[261]	Pearlmutter, B. A. and J. M. Siskind. (2008). “Reverse-mode AD in a functional framework: Lambda the ultimate backpropagator”. ACM Transactions on Programming Languages and Systems (TOPLAS). 30(2): 1-36.
[262]	Pennec, X. (2006). “Intrinsic statistics on Riemannian manifolds: Basic tools for geometric measurements”. Journal of Mathematical Imaging and Vision. 25(1): 127-154. · Zbl 1478.94072
[263]	Peyré, G. and M. Cuturi. (2019). “Computational optimal transport: With applications to data science”. Foundations and Trends ® in Machine Learning. 11(5-6): 355-607.
[264]	Poli, M., S. Massaroli, A. Yamashita, H. Asama, and J. Park. (2020). “Hypersolvers: Toward Fast Continuous-Depth Models”. In: Ad-vances in Neural Information Processing Systems 33: Annual Con-ference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual. url: https : / / proceedings . neurips.cc/paper/2020/hash/f1686b4badcf28d33ed632036c7ab0b8-Abstract.html.
[265]	Prémont-Schwarz, I., J. Vitku, and J. Feyereisl. (2022). “A Simple Guard for Learned Optimizers”. In: International Conference on Ma-chine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Vol. 162. Proceedings of Machine Learning Research. PMLR. 17910-17925. url: https://proceedings.mlr.press/v162/premont-schwarz22a.html.
[266]	Raghu, A., M. Raghu, S. Bengio, and O. Vinyals. (2020). “Rapid Learn-ing or Feature Reuse? Towards Understanding the Effectiveness of MAML”. In: 8th International Conference on Learning Representa-tions, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. url: https://openreview.net/forum?id=rkgMkCEtPB.
[267]	Rajeswaran, A., C. Finn, S. M. Kakade, and S. Levine. (2019). “Meta-Learning with Implicit Gradients”. In: Advances in Neural Infor-mation Processing Systems 32: Annual Conference on Neural In-formation Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada. 113-124. url: https://proceedings. neurips.cc/paper/2019/hash/072b030ba126b2f4b2374f342be9ed44-Abstract.html.
[268]	Ravi, S. and A. Beatson. (2019). “Amortized Bayesian Meta-Learning”. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. url: https : / / openreview.net/forum?id=rkgpy3C5tX.
[269]	Ravi, S. and H. Larochelle. (2017). “Optimization as a Model for Few-Shot Learning”. In: 5th International Conference on Learn-ing Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. url: https://openreview.net/forum? id=rJY0-Kcll.
[270]	Real, E., C. Liang, D. R. So, and Q. V. Le. (2020). “AutoML-Zero: Evolving Machine Learning Algorithms From Scratch”. In: Pro-ceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Vol. 119. Proceed-ings of Machine Learning Research. PMLR. 8007-8019. url: http: //proceedings.mlr.press/v119/real20a.html.
[271]	Rezende, D. J. and S. Mohamed. (2015). “Variational Inference with Nor-malizing Flows”. In: Proceedings of the 32nd International Confer-ence on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. Vol. 37. JMLR Workshop and Conference Proceedings. JMLR.org. 1530-1538. url: http://proceedings.mlr.press/v37/rezende15.html.
[272]	Rezende, D. J., S. Mohamed, and D. Wierstra. (2014). “Stochastic Backpropagation and Approximate Inference in Deep Generative Models”. In: Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014. Vol. 32. JMLR Workshop and Conference Proceedings. JMLR.org. 1278-1286. url: http://proceedings.mlr.press/v32/rezende14.html.
[273]	Rezende, D. J. and F. Viola. (2018). “Taming vaes”. ArXiv preprint. abs/1810.00597.
[274]	Al-Rfou, R., G. Alain, A. Almahairi, C. Angermueller, D. Bahdanau, N. Ballas, F. Bastien, J. Bayer, A. Belikov, A. Belopolsky, et al. (2016). “Theano: A Python framework for fast computation of mathematical expressions”. arXiv e-prints: arXiv-1605.
[275]	Richter, S. L. and R. A. Decarlo. (1983). “Continuation methods: The-ory and applications”. IEEE Transactions on Systems, Man, and Cybernetics. SMC-13(4): 459-464.
[276]	Ronneberger, O., P. Fischer, and T. Brox. (2015). “U-net: Convolutional networks for biomedical image segmentation”. In: International Conference on Medical Image Computing and Computer-assisted Intervention. Springer. 234-241.
[277]	Ruder, S. (2017). “An overview of multi-task learning in deep neural networks”. ArXiv preprint. abs/1706.05098.
[278]	Runarsson, T. P. and M. T. Jonsson. (2000). “Evolution and design of distributed learning rules”. In: 2000 IEEE Symposium on Combi-nations of Evolutionary Computation and Neural Networks. IEEE. 59-63.
[279]	Russakovsky, O., J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, and M. Bernstein. (2015). “Ima-genet large scale visual recognition challenge”. International Journal of Computer Vision. 115(3): 211-252.
[280]	Rusu, A. A., D. Rao, J. Sygnowski, O. Vinyals, R. Pascanu, S. Osindero, and R. Hadsell. (2019). “Meta-Learning with Latent Embedding Optimization”. In: 7th International Conference on Learning Rep-resentations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. url: https://openreview.net/forum?id=BJgklhAcK7.
[281]	Ryu, M., Y. Chow, R. Anderson, C. Tjandraatmadja, and C. Boutilier. (2020). “CAQL: Continuous Action Q-Learning”. In: 8th Interna-tional Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. url: https://openreview.net/ forum?id=BkxXe0Etwr.
[282]	Sacks, J. and B. Boots. (2022). “Learning to Optimize in Model Predic-tive Control”. In: 2022 International Conference on Robotics and Automation (ICRA). IEEE. 10549-10556.
[283]	Sakaue, S. and T. Oki. (2022). “Discrete-Convex-Analysis-Based Frame-work for Warm-Starting Algorithms with Predictions”. In: Advances in Neural Information Processing Systems. Ed. by A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho. url: https://openreview.net/ forum?id=-GgDBzwZ-e7.
[284]	Salakhutdinov, R. (2014). “Deep learning”. In: The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, New York, NY, USA -August 24 -27, 2014. ACM. 1973. doi: 10.1145/2623330.2630809. · doi:10.1145/2623330.2630809
[285]	Salimans, T. and J. Ho. (2022). “Progressive Distillation for Fast Sam-pling of Diffusion Models”. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. url: https://openreview.net/forum?id=TIdIXIpzhoI.
[286]	Sanchez-Gonzalez, A., J. Godwin, T. Pfaff, R. Ying, J. Leskovec, and P. W. Battaglia. (2020). “Learning to Simulate Complex Physics with Graph Networks”. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Vol. 119. Proceedings of Machine Learning Research. PMLR. 8459-8468. url: http : / / proceedings . mlr . press / v119 / sanchez-gonzalez20a.html.
[287]	Santambrogio, F. (2015). “Optimal transport for applied mathemati-cians”. Birkäuser Cham. · Zbl 1401.49002
[288]	Scherrer, B. (2010). “Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view”. In: Proceedings of the 27th International Confer-ence on Machine Learning (ICML-10), June 21-24, 2010, Haifa, Israel. Omnipress. 959-966. url: https://icml.cc/Conferences/2010/ papers/654.pdf.
[289]	Schmidhuber, J. (1987). “Evolutionary principles in self-referential learn-ing, or on learning how to learn: the meta-meta-... hook”. PhD thesis. Technische Universität München.
[290]	Schmidhuber, J. (1995). “On learning how to learn learning strategies”. Tech. rep. TU Munchen.
[291]	Schwarzschild, A., E. Borgnia, A. Gupta, F. Huang, U. Vishkin, M. Goldblum, and T. Goldstein. (2021). “Can You Learn an Algo-rithm? Generalizing from Easy to Hard Problems with Recurrent Networks”. In: Advances in Neural Information Processing Sys-tems 34: Annual Conference on Neural Information Processing Sys-tems 2021, NeurIPS 2021, December 6-14, 2021, virtual. 6695-6706. url: https : / / proceedings . neurips . cc / paper / 2021 / hash / 3501672ebc68a5524629080e3ef60aef-Abstract.html.
[292]	Sercu, T., R. Verkuil, J. Meier, B. Amos, Z. Lin, C. Chen, J. Liu, Y. LeCun, and A. Rives. (2021). “Neural Potts Model”. bioRxiv.
[293]	Shaban, A., C. Cheng, N. Hatch, and B. Boots. (2019). “Truncated Back-propagation for Bilevel Optimization”. In: The 22nd Interna-tional Conference on Artificial Intelligence and Statistics, AISTATS 2019, 16-18 April 2019, Naha, Okinawa, Japan. Vol. 89. Proceed-ings of Machine Learning Research. PMLR. 1723-1732. url: http: //proceedings.mlr.press/v89/shaban19a.html.
[294]	Shao, Z., J. Yang, C. Shen, and S. Ren. (2021). “Learning for Robust Combinatorial Optimization: Algorithm and Application”. ArXiv preprint. abs/2112.10377.
[295]	Shapiro, A. (2003). “Sensitivity Analysis of Generalized Equations.” Journal of Mathematical Sciences. 115(4). · Zbl 1136.90482
[296]	Shu, R. (2017). “Amortized Optimization”. url: https://ruishu.io/ 2017/11/07/amortized-optimization/.
[297]	Silver, D., A. Goyal, I. Danihelka, M. Hessel, and H. van Hasselt. (2022). “Learning by Directional Gradient Descent”. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. url: https://openreview.net/ forum?id=5i7lJLuhTm.
[298]	Silver, D., G. Lever, N. Heess, T. Degris, D. Wierstra, and M. A. Riedmiller. (2014). “Deterministic Policy Gradient Algorithms”. In: Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014. Vol. 32. JMLR Workshop and Conference Proceedings. JMLR.org. 387-395. url: http://proceedings.mlr.press/v32/silver14.html.
[299]	Sjölund, J. (2023). “A Tutorial on Parametric Variational Inference”. ArXiv preprint. abs/2301.01236.
[300]	Sjölund, J. and M. Bånkestad. (2022). “Graph-based Neural Acceleration for Nonnegative Matrix Factorization”. arXiv: 2202.00264 [cs.LG].
[301]	Smola, A. J., S. V. N. Vishwanathan, and Q. V. Le. (2007). “Bundle Methods for Machine Learning”. In: Advances in Neural Information Processing Systems 20, Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 3-6, 2007. Curran Associates, Inc. 1377-1384. url: https://proceedings.neurips.cc/paper/2007/ hash/26337353b7962f533d78c762373b3318-Abstract.html.
[302]	Sønderby, C. K., T. Raiko, L. Maaløe, S. K. Sønderby, and O. Winther. (2016). “Ladder Variational Autoencoders”. In: Advances in Neural Information Processing Systems 29: Annual Conference on Neural In-formation Processing Systems 2016, December 5-10, 2016, Barcelona, Spain. 3738-3746. url: https://proceedings.neurips.cc/paper/2016/ hash/6ae07dcb33ec3b7c814df797cbda0f87-Abstract.html.
[303]	Stanley, K. O., D. B. D’Ambrosio, and J. Gauci. (2009). “A hypercube-based encoding for evolving large-scale neural networks”. Artificial Life. 15(2): 185-212.
[304]	Stellato, B., G. Banjac, P. Goulart, A. Bemporad, and S. Boyd. (2018). “OSQP: An operator splitting solver for quadratic programs”. In: UKACC 12th International Conference on Control (CONTROL). · Zbl 1452.90236
[305]	Still, G. (2018). “Lectures on parametric optimization: An introduction”. Optimization Online.
[306]	Stuhlmüller, A., J. Taylor, and N. D. Goodman. (2013). “Learning Stochastic Inverses”. In: Advances in Neural Information Pro-cessing Systems 26: 27th Annual Conference on Neural Infor-mation Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States. 3048-3056. url: https : / / proceedings . neurips . cc / paper / 2013 / hash / 7f53f8c6c730af6aeb52e66eb74d8507-Abstract.html.
[307]	Sutton, R. S. and A. G. Barto. (2018). Reinforcement learning: An introduction. MIT Press. · Zbl 1407.68009
[308]	Swersky, K., Y. Rubanova, D. Dohan, and K. Murphy. (2020). “Amor-tized Bayesian Optimization over Discrete Spaces”. In: Proceedings of the Thirty-Sixth Conference on Uncertainty in Artificial Intelli-gence, UAI 2020, virtual online, August 3-6, 2020. Vol. 124. Pro-ceedings of Machine Learning Research. AUAI Press. 769-778. url: http://proceedings.mlr.press/v124/swersky20a.html.
[309]	Szegedy, C., V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. (2016). “Rethinking the Inception Architecture for Computer Vision”. In: 2016 IEEE Conference on Computer Vision and Pattern Recogni-tion, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. IEEE Computer Society. 2818-2826. doi: 10.1109/CVPR.2016.308. · doi:10.1109/CVPR.2016.308
[310]	Taghvaei, A. and A. Jalali. (2019). “2-wasserstein approximation via restricted convex potentials with application to improved training for gans”. ArXiv preprint. abs/1902.07197.
[311]	Tallec, C. and Y. Ollivier. (2017). “Unbiasing truncated backpropagation through time”. ArXiv preprint. abs/1705.08209.
[312]	Tallec, C. and Y. Ollivier. (2018). “Unbiased Online Recurrent Optimiza-tion”. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 -May 3, 2018, Con-ference Track Proceedings. url: https://openreview.net/forum?id= rJQDjk-0b.
[313]	Tennenholtz, G. and S. Mannor. (2019). “The Natural Language of Actions”. In: Proceedings of the 36th International Conference on Ma-chine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Vol. 97. Proceedings of Machine Learning Research. PMLR. 6196-6205. url: http://proceedings.mlr.press/v97/tennenholtz19a. html.
[314]	Thornton, J. and M. Cuturi. (2022). “Rethinking Initialization of the Sinkhorn Algorithm”. ArXiv preprint. abs/2206.07630.
[315]	Thrun, S. and L. Pratt. (1998). “Learning to learn: Introduction and overview”. In: Learning to Learn. Springer. 3-17. · Zbl 0891.68079
[316]	Usman, A., M. Rafiq, M. Saeed, A. Nauman, A. Almqvist, and M. Liwicki. (2021). “Machine Learning Computational Fluid Dynamics”. In: 2021 Swedish Artificial Intelligence Society Workshop (SAIS). IEEE. 1-4.
[317]	Van de Wiele, T., D. Warde-Farley, A. Mnih, and V. Mnih. (2020). “Q-learning in enormous action spaces via amortized approximate maximization”. ArXiv preprint. abs/2001.08116.
[318]	Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. (2017). “Attention is All you Need”. In: Advances in Neural Information Processing Systems 30: Annual Con-ference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA. 5998-6008. url: https://proceedings. neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
[319]	Venkatakrishnan, S. V., C. A. Bouman, and B. Wohlberg. (2013). “Plug-and-play priors for model based reconstruction”. In: IEEE Global Conference on Signal and Information Processing. IEEE. 945-948.
[320]	Venkataraman, S. and B. Amos. (2021). “Neural Fixed-Point Accelera-tion for Convex Optimization”. ArXiv preprint. abs/2107.10254.
[321]	Vicol, P., L. Metz, and J. Sohl-Dickstein. (2021). “Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolu-tion Strategies”. In: Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event. Vol. 139. Proceedings of Machine Learning Research. PMLR. 10553-10563. url: http://proceedings.mlr.press/v139/vicol21a.html.
[322]	Vilalta, R. and Y. Drissi. (2002). “A perspective view and survey of meta-learning”. Artificial Intelligence Review. 18(2): 77-95.
[323]	Villani, C. (2009). Optimal transport: old and new. Vol. 338. Springer. Vinuesa, R. and S. L. Brunton. (2021). “The potential of machine learning to enhance computational fluid dynamics”. ArXiv preprint. abs/2110.02085.
[324]	Wainwright, M. J. and M. I. Jordan. (2008). “Graphical models, expo-nential families, and variational inference”. Foundations and Trends ® in Machine Learning. 1(1-2): 1-305. · Zbl 1193.62107
[325]	Walker, H. F. and P. Ni. (2011). “Anderson acceleration for fixed-point iterations”. SIAM Journal on Numerical Analysis. 49(4): 1715-1735. · Zbl 1254.65067
[326]	Wang, H., H. Zhao, and B. Li. (2021). “Bridging Multi-Task Learn-ing and Meta-Learning: Towards Efficient Training and Effective Adaptation”. In: Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event. Vol. 139. Proceedings of Machine Learning Research. PMLR. 10991-11002. url: http://proceedings.mlr.press/v139/wang21ad.html.
[327]	Wang, T. and J. Ba. (2020). “Exploring Model-based Planning with Policy Networks”. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. url: https://openreview.net/forum?id=H1exf64KwH.
[328]	Ward, L. B. (1937). “Reminiscence and rote learning.” Psychological Monographs. 49(4): i.
[329]	Watkins, C. J. and P. Dayan. (1992). “Q-learning”. Machine Learning. 8(3-4): 279-292. · Zbl 0773.68062
[330]	Watson, L. T. and R. T. Haftka. (1989). “Modern homotopy methods in optimization”. Computer Methods in Applied Mechanics and Engineering. 74(3): 289-305. · Zbl 0693.65046
[331]	Webb, S., A. Golinski, R. Zinkov, S. Narayanaswamy, T. Rainforth, Y. W. Teh, and F. Wood. (2018). “Faithful Inversion of Generative Models for Effective Amortized Inference”. In: Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada. 3074-3084. url: https://proceedings. neurips.cc/paper/2018/hash/894b77f805bd94d292574c38c5d628d5-Abstract.html.
[332]	Weng, L. (2018). “Meta-Learning: Learning to Learn Fast”. url: http: //lilianweng.github.io/lil-log.
[333]	Werbos, P. J. (1990). “Backpropagation through time: what it does and how to do it”. Proceedings of the IEEE. 78(10): 1550-1560.
[334]	Wichrowska, O., N. Maheswaranathan, M. W. Hoffman, S. G. Col-menarejo, M. Denil, N. de Freitas, and J. Sohl-Dickstein. (2017). “Learned Optimizers that Scale and Generalize”. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. Vol. 70. Pro-ceedings of Machine Learning Research. PMLR. 3751-3760. url: http://proceedings.mlr.press/v70/wichrowska17a.html.
[335]	Wiewel, S., M. Becher, and N. Thuerey. (2019). “Latent space physics: Towards learning the temporal evolution of fluid flow”. In: Computer Graphics Forum. Vol. 38. Wiley Online Library. 71-82.
[336]	Williams, R. J. (1992). “Simple statistical gradient-following algorithms for connectionist reinforcement learning”. Reinforcement Learning: 5-32.
[337]	Williams, R. J. and D. Zipser. (1989). “A learning algorithm for contin-ually running fully recurrent neural networks”. Neural Computation. 1(2): 270-280.
[338]	Wu, M., K. Choi, N. D. Goodman, and S. Ermon. (2020). “Meta-Amortized Variational Inference and Learning”. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. AAAI Press. 6404-6412. url: https : //aaai.org/ojs/index.php/AAAI/article/view/6111.
[339]	Wu, Y., M. Ren, R. Liao, and R. B. Grosse. (2018). “Understanding Short-Horizon Bias in Stochastic Meta-Optimization”. In: 6th In-ternational Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 -May 3, 2018, Conference Track Proceedings. url: https://openreview.net/forum?id=H1MczcgR-.
[340]	Xiao, Y., E. P. Xing, and W. Neiswanger. (2021). “Amortized Auto-Tuning: Cost-Efficient Transfer Optimization for Hyperparameter Recommendation”. ArXiv preprint. abs/2106.09179.
[341]	Xie, K., H. Bharadhwaj, D. Hafner, A. Garg, and F. Shkurti. (2021). “Latent Skill Planning for Exploration and Transfer”. In: 9th In-ternational Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. url: https://openreview. net/forum?id=jXe91kq3jAq.
[342]	Xue, T., A. Beatson, S. Adriaenssens, and R. P. Adams. (2020). “Amor-tized Finite Element Analysis for Fast PDE-Constrained Optimiza-tion”. In: Proceedings of the 37th International Conference on Ma-chine Learning, ICML 2020, 13-18 July 2020, Virtual Event. Vol. 119. Proceedings of Machine Learning Research. PMLR. 10638-10647. url: http://proceedings.mlr.press/v119/xue20a.html.
[343]	You, Y., Y. Cao, T. Chen, Z. Wang, and Y. Shen. (2022). “Bayesian Modeling and Uncertainty Quantification for Learning to Optimize: What, Why, and How”. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. url: https://openreview.net/forum?id=EVVadRFRgL7.
[344]	Yu, F. and V. Koltun. (2016). “Multi-Scale Context Aggregation by Dilated Convolutions”. In: 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings. url: http://arxiv.org/abs/1511. 07122.
[345]	Zaheer, M., S. Kottur, S. Ravanbakhsh, B. Póczos, R. Salakhutdinov, and A. J. Smola. (2017). “Deep Sets”. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA. 3391-3401. url: https://proceedings.neurips. cc/paper/2017/hash/f22e4747da1aa27e363d86d40ff442fe-Abstract. html.
[346]	Zamzam, A. S. and K. Baker. (2020). “Learning optimal solutions for extremely fast AC optimal power flow”. In: IEEE International Con-ference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm). IEEE. 1-6.
[347]	Zeiler, M. D. (2012). “Adadelta: an adaptive learning rate method”. arXiv preprint arXiv:1212.5701.
[348]	Zhang, C. and V. R. Lesser. (2010). “Multi-Agent Learning with Policy Prediction”. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010, Atlanta, Georgia, USA, July 11-15, 2010. AAAI Press. url: http://www.aaai.org/ocs/index. php/AAAI/AAAI10/paper/view/1885.
[349]	Zhang, C., M. Ren, and R. Urtasun. (2019a). “Graph HyperNetworks for Neural Architecture Search”. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. url: https://openreview.net/forum?id=rkgW0oA9FX.
[350]	Zhang, J., B. O’Donoghue, and S. Boyd. (2020). “Globally convergent type-I Anderson acceleration for nonsmooth fixed-point iterations”. SIAM Journal on Optimization. 30(4): 3170-3197. · Zbl 1525.47126
[351]	Zhang, K., W. Zuo, S. Gu, and L. Zhang. (2017). “Learning Deep CNN Denoiser Prior for Image Restoration”. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017. IEEE Computer Society. 2808-2817. doi: 10.1109/CVPR.2017.300. · doi:10.1109/CVPR.2017.300
[352]	Zhang, X., M. Bujarbaruah, and F. Borrelli. (2019b). “Safe and near-optimal policy learning for model predictive control using primal-dual neural networks”. In: American Control Conference (ACC). IEEE. 354-359.
[353]	Zheng, W., T. Chen, T. Hu, and Z. Wang. (2022). “Symbolic Learning to Optimize: Towards Interpretability and Scalability”. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. url: https://openreview.net/ forum?id=ef0nInZHKIC.
[354]	Zhmoginov, A., M. Sandler, and M. Vladymyrov. (2022). “HyperTrans-former: Model Generation for Supervised and Semi-Supervised Few-Shot Learning”. In: International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Vol. 162. Proceedings of Machine Learning Research. PMLR. 27075-27098. url: https://proceedings.mlr.press/v162/zhmoginov22a.html.
[355]	Zintgraf, L. M., K. Shiarlis, V. Kurin, K. Hofmann, and S. Whiteson. (2019). “Fast Context Adaptation via Meta-Learning”. In: Proceed-ings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Vol. 97. Proceedings of Machine Learning Research. PMLR. 7693-7702. url: http://proceedings.mlr.press/v97/zintgraf19a.html.

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.