×

On obtaining sparse semantic solutions for inverse problems, control, and neural network training. (English) Zbl 07515412

Summary: Modern-day techniques for designing neural network architectures are highly reliant on trial and error, heuristics, and so-called best practices, without much rigorous justification. After choosing a network architecture, an energy function (or loss) is minimized, choosing from a wide variety of optimization and regularization methods. Given the ad-hoc nature of network architecture design, it would be useful if the optimization led to a sparse solution so that one could ascertain the importance or unimportance of various parts of the network architecture. Of course, historically, sparsity has always been a useful notion for inverse problems where researchers often prefer the \(L_1\) norm over \(L_2\). Similarly for control, one often includes the control variables in the objective function in order to minimize their efforts. Motivated by the design and training of neural networks, we propose a novel column space search approach that emphasizes the data over the model, as well as a novel iterative Levenberg-Marquardt algorithm that smoothly converges to a regularized SVD as opposed to the abrupt truncation inherent to PCA. In the case of our iterative Levenberg-Marquardt algorithm, it suffices to consider only the linearized subproblem in order to verify our claims. However, the claims we make about our novel column space search approach require examining the impact of the solution method for the linearized subproblem on the fully nonlinear original problem; thus, we consider a complex real-world inverse problem (determining facial expressions from RGB images).

MSC:

90Cxx Mathematical programming
68Txx Artificial intelligence
65Kxx Numerical methods for mathematical programming, optimization and variational techniques

References:

[1] Abadi, Martín; Agarwal, Ashish; Barham, Paul; Brevdo, Eugene; Chen, Zhifeng; Citro, Craig; Corrado, Greg S.; Davis, Andy; Dean, Jeffrey; Devin, Matthieu; Ghemawat, Sanjay; Goodfellow, Ian; Harp, Andrew; Irving, Geoffrey; Isard, Michael; Jia, Yangqing; Jozefowicz, Rafal; Kaiser, Lukasz; Kudlur, Manjunath; Levenberg, Josh; Mané, Dandelion; Monga, Rajat; Moore, Sherry; Murray, Derek; Olah, Chris; Schuster, Mike; Shlens, Jonathon; Steiner, Benoit; Sutskever, Ilya; Talwar, Kunal; Tucker, Paul; Vanhoucke, Vincent; Vasudevan, Vijay; Viégas, Fernanda; Vinyals, Oriol; Warden, Pete; Wattenberg, Martin; Wicke, Martin; Yu, Yuan; Zheng, Xiaoqiang, TensorFlow: large-scale machine learning on heterogeneous systems (2015), Software available from
[2] Abergel, Frédéric; Temam, Roger, On some control problems in fluid mechanics, Theor. Comput. Fluid Dyn., 1, 303-325 (1990) · Zbl 0708.76106
[3] Akhtar, Naveed; Mian, Ajmal, Threat of adversarial attacks on deep learning in computer vision: a survey, IEEE Access, 6, 14410-14430 (2018)
[4] Alvarez, Jose M.; Salzmann, Mathieu, Learning the number of neurons in deep networks, (Lee, D. D.; Sugiyama, M.; Luxburg, U. V.; Guyon, I.; Garnett, R., Advances in Neural Information Processing Systems 29 (2016), Curran Associates, Inc.), 2270-2278
[5] Baglama, James; Reichel, Lothar, Augmented implicitly restarted Lanczos bidiagonalization methods, SIAM J. Sci. Comput., 27, 1, 19-42 (July 2005) · Zbl 1087.65039
[6] Bao, Michael; Wu, Jane; Yao, Xinwei; Fedkiw, Ronald, Deep Energies for Estimating Three-Dimensional Facial Pose and Expression (2018)
[7] Bao, Michael H.; Cong, Matthew D.; Grabli, Stéphane; Fedkiw, Ronald, High-quality face capture using anatomical muscles, (2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR (2019)), 10794-10803
[8] Bao, Michael H.; Hyde, David; Hua, Xinru; Fedkiw, Ronald, Improved Search Strategies with Application to Estimating Facial Blendshape Parameters (2020)
[9] Beeler, Thabo; Bickel, Bernd; Beardsley, Paul; Sumner, Bob; Gross, Markus, High-quality single-shot capture of facial geometry, ACM Trans. Graph., 29, 4 (July 2010)
[10] Beeler, Thabo; Hahn, Fabian; Bradley, Derek; Bickel, Bernd; Beardsley, Paul; Gotsman, Craig; Sumner, Robert W.; Gross, Markus, High-quality passive facial performance capture using anchor frames, ACM Trans. Graph., 30, 4 (July 2011)
[11] Bengio, Yoshua; Ducharme, Réjean; Vincent, Pascal; Jauvin, Christian, A neural probabilistic language model, J. Mach. Learn. Res., 3, 1137-1155 (2003) · Zbl 1061.68157
[12] Bertero, Mario; Boccacci, Patrizia, Introduction to Inverse Problems in Imaging (1998), CRC Press · Zbl 0914.65060
[13] Bhat, Kiran S.; Goldenthal, Rony; Ye, Yuting; Mallet, Ronald; Koperwas, Michael, High fidelity facial animation capture and retargeting with contours, (Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation (2013), ACM), 7-14
[14] Åke, Björck, Numerical Methods for Least Squares Problems (1996), SIAM · Zbl 0847.65023
[15] Blanz, Volker; Vetter, Thomas, A morphable model for the synthesis of 3D faces, (Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques (1999), ACM Press/Addison-Wesley Publishing Co.), 187-194
[16] Bojarski, Mariusz; Del Testa, Davide; Dworakowski, Daniel; Firner, Bernhard; Flepp, Beat; Goyal, Prasoon; Jackel, Lawrence D.; Monfort, Mathew; Muller, Urs; Zhang, Jiakai; Zhang, Xin; Zhao, Jake; Zieba, Karol, End to End Learning for Self-Driving Cars (2016)
[17] Bottou, Léon, Stochastic Gradient Descent Tricks, 421-436 (2012), Springer Berlin Heidelberg: Springer Berlin Heidelberg Berlin, Heidelberg
[18] Bottou, Léon; Curtis, Frank E.; Nocedal, Jorge, Optimization Methods for Large-Scale Machine Learning (2016) · Zbl 1397.65085
[19] Bottou, Léon; Peters, Jonas; Quiñonero Candela, Joaquin; Charles, Denis X.; Max Chickering, D.; Portugaly, Elon; Ray, Dipankar; Simard, Patrice; Snelson, Ed, Counterfactual reasoning and learning systems: the example of computational advertising, J. Mach. Learn. Res., 14, 1, 3207-3260 (January 2013) · Zbl 1318.62206
[20] Bouaziz, Sofien; Wang, Yangang; Pauly, Mark, Online modeling for realtime facial animation, ACM Trans. Graph., 32, 4, 40 (2013) · Zbl 1305.68211
[21] Broyden, Charles G., A class of methods for solving nonlinear simultaneous equations, Math. Comput., 19, 92, 577-593 (1965) · Zbl 0131.13905
[22] Broyden, Charles G., Quasi-Newton methods and their application to function minimisation, Math. Comput., 21, 99, 368-381 (1967) · Zbl 0155.46704
[23] Broyden, Charles G., A new double-rank minimisation algorithm. Preliminary report, (Notices of the American Mathematical Society, vol. 16 (1969), Amer. Mathematical Soc.: Amer. Mathematical Soc. 201 Charles St., Providence, RI), 670, 02940-2213
[24] Bulat, Adrian; Tzimiropoulos, Georgios, How far are we from solving the 2D 3D face alignment problem? (and a dataset of 230,000 3D facial landmarks), (2017 IEEE International Conference on Computer Vision. 2017 IEEE International Conference on Computer Vision, ICCV (2017)), 1021-1030
[25] Cao, Chen; Weng, Yanlin; Zhou, Shun; Tong, Yiying; Facewarehouse, Kun Zhou, A 3D facial expression database for visual computing, IEEE Trans. Vis. Comput. Graph., 20, 3, 413-425 (2014)
[26] Chan, Tony F.; Tai, Xue-Cheng, Level set and total variation regularization for elliptic inverse problems with discontinuous coefficients, J. Comput. Phys., 193, 1, 40-66 (2004) · Zbl 1036.65086
[27] Charbonnier, Pierre; Blanc-Féraud, Laure; Aubert, Gilles; Barlaud, Michel, Deterministic edge-preserving regularization in computed imaging, IEEE Trans. Image Process., 6, 2, 298-311 (1997)
[28] Chen, Bilian; He, Simai; Li, Zhening; Zhang, Shuzhong, Maximum block improvement and polynomial optimization, SIAM J. Optim., 22, 1, 87-107 (2012) · Zbl 1250.90069
[29] Cong, Matthew; Bao, Michael; E, Jane L.; Bhat, Kiran S.; Fedkiw, Ronald, Fully automatic generation of anatomical face simulation models, (Proceedings of the 14th ACM SIGGRAPH / Eurographics Symposium on Computer Animation. Proceedings of the 14th ACM SIGGRAPH / Eurographics Symposium on Computer Animation, SCA ’15 (2015), Association for Computing Machinery: Association for Computing Machinery New York, NY, USA), 175-183
[30] Cong, Matthew; Lan, Lana; Fedkiw, Ronald, Muscle simulation for facial animation in Kong: Skull Island, (ACM SIGGRAPH 2017 Talks. ACM SIGGRAPH 2017 Talks, SIGGRAPH ’17 (2017), Association for Computing Machinery: Association for Computing Machinery New York, NY, USA)
[31] Cong, Matthew D.; Bhat, Kiran S.; Fedkiw, Ronald, Art-directed muscle simulation for high-end facial animation, (Proceedings of the 15th ACM SIGGRAPH / Eurographics Symposium on Computer Animation. Proceedings of the 15th ACM SIGGRAPH / Eurographics Symposium on Computer Animation, SCA’16 (2016)), 119-127
[32] Cong, Matthew D.; Lan, Lana; Fedkiw, Ronald, Local geometric indexing of high resolution data for facial reconstruction from sparse markers (2019), arXiv preprint
[33] Davidon, William C., Variable metric method for minimization (1959), Argonne National Laboratory, Technical Report ANL-5990, 5 · Zbl 0752.90062
[34] Dean, Jeffrey; Corrado, Greg; Monga, Rajat; Chen, Kai; Devin, Matthieu; Mao, Mark; Ranzato, Marc’aurelio; Senior, Andrew; Tucker, Paul; Yang, Ke, Large scale distributed deep networks, (Advances in Neural Information Processing Systems (2012)), 1223-1231
[35] Derksen, Shelley; Keselman, Harvey J., Backward, forward and stepwise automated subset selection algorithms: frequency of obtaining authentic and noise variables, Br. J. Math. Stat. Psychol., 45, 2, 265-282 (1992)
[36] Diaz, Fernando; Metzler, Donald; Amer-Yahia, Sihem, Relevance and ranking in online dating systems, (Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’10 (2010), Association for Computing Machinery: Association for Computing Machinery New York, NY, USA), 66-73
[37] Duchi, John; Hazan, Elad; Singer, Yoram, Adaptive subgradient methods for online learning and stochastic optimization, J. Mach. Learn. Res., 12, 2121-2159 (Jul 2011) · Zbl 1280.68164
[38] Efron, Bradley; Hastie, Trevor; Johnstone, Iain; Tibshirani, Robert, Least angle regression, Ann. Stat., 32, 2, 407-499 (2004) · Zbl 1091.62054
[39] Engl, Heinz Werner; Hanke, Martin; Neubauer, Andreas, Regularization of Inverse Problems, vol. 375 (1996), Springer Science & Business Media · Zbl 0859.65054
[40] Fan, Jinyan, The modified Levenberg-Marquardt method for nonlinear equations with cubic convergence, Math. Comput., 81, 277, 447-466 (2012) · Zbl 1242.65103
[41] Fedkiw, Ronald; Zhu, Yilin; Lin, Winnie; Wu, Jane, Continuous Mathematical Methods, Emphasizing Machine Learning (2020), Stanford CS205L Winter 2020 Lecture Slides
[42] Fedkiw, Ronald P.; Sapiro, Guillermo; Shu, Chi-Wang, Shock capturing, level sets, and PDE based methods in computer vision and image processing: a review of Osher’s contributions, J. Comput. Phys., 185, 2, 309-341 (2003) · Zbl 1026.68147
[43] Fehlberg, Erwin, Low-order classical Runge-Kutta formulas with stepsize control and their application to some heat transfer problems (1969), NASA, Technical Report Technical Report 315
[44] Fletcher, Roger, A new approach to variable metric algorithms, Comput. J., 13, 3, 317-322 (1970) · Zbl 0207.17402
[45] Fletcher, Roger, Practical Methods of Optimization (1980), John Wiley & Sons · Zbl 0988.65043
[46] Fletcher, Roger; Powell, Michael J. D., A rapidly convergent descent method for minimization, Comput. J., 6, 2, 163-168 (1963) · Zbl 0132.11603
[47] Gaines, Jessica G.; Lyons, Terry J., Variable step size control in the numerical solution of stochastic differential equations, SIAM J. Appl. Math., 57, 5, 1455-1484 (1997) · Zbl 0888.60046
[48] Geneva, Nicholas; Zabaras, Nicholas, Modeling the dynamics of PDE systems with physics-constrained deep auto-regressive networks, J. Comput. Phys., 403, Article 109056 pp. (2020) · Zbl 1454.65130
[49] Geng, Zhenglin; Johnson, Daniel; Fedkiw, Ronald, Coercing machine learning to output physically accurate results, J. Comput. Phys., 406, Article 109099 pp. (2020) · Zbl 1453.68164
[50] Ghosh, Abhijeet; Fyffe, Graham; Tunwattanapong, Borom; Busch, Jay; Yu, Xueming; Debevec, Paul, Multiview face capture using polarized spherical gradient illumination, (Proceedings of the 2011 SIGGRAPH Asia Conference. Proceedings of the 2011 SIGGRAPH Asia Conference, SA ’11 (2011), Association for Computing Machinery: Association for Computing Machinery New York, NY, USA) · Zbl 1305.68284
[51] Gibou, Frederic; Hyde, David; Fedkiw, Ron, Sharp interface approaches and deep learning techniques for multiphase flows, J. Comput. Phys., 380, 442-463 (2019) · Zbl 1451.76131
[52] Goldfarb, Donald, A family of variable-metric methods derived by variational means, Math. Comput., 24, 109, 23-26 (1970) · Zbl 0196.18002
[53] Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron, Deep Learning (2016), MIT Press · Zbl 1373.68009
[54] Gordon, Ariel; Eban, Elad; Nachum, Ofir; Chen, Bo; Wu, Hao; Yang, Tien-Ju; Choi, Edward, MorphNet: fast simple resource-constrained structure learning of deep networks, (2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018)), 1586-1595
[55] Graepel, Thore; Quiñonero Candela, Joaquin; Borchert, Thomas; Herbrich, Ralf, Web-scale Bayesian click-through rate prediction for sponsored search advertising in Microsoft’s Bing search engine, (Proceedings of the 27th International Conference on International Conference on Machine Learning. Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML ’10 (2010), Omnipress: Omnipress Madison, WI, USA), 13-20
[56] Guo, Yiwen; Yao, Anbang; Chen, Yurong, Dynamic network surgery for efficient dnns, (Advances in Neural Information Processing Systems. Advances in Neural Information Processing Systems, NIPS (2016))
[57] Han, Song; Mao, Huizi; Dally, William J., Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding (2015)
[58] Han, Song; Pool, Jeff; Tran, John; Dally, William, Learning both weights and connections for efficient neural network, (Cortes, C.; Lawrence, N. D.; Lee, D. D.; Sugiyama, M.; Garnett, R., Advances in Neural Information Processing Systems 28 (2015), Curran Associates, Inc.), 1135-1143
[59] Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome, The Elements of Statistical Learning, Springer Series in Statistics (2001), Springer: Springer New York, NY, USA · Zbl 0973.62007
[60] He, Yihui; Zhang, Xiangyu; Sun, Jian, Channel pruning for accelerating very deep neural networks, (Proceedings of the IEEE International Conference on Computer Vision (2017)), 1389-1397
[61] Heath, Michael T., Scientific Computing: An Introductory Survey (2002), SIAM · Zbl 1411.65003
[62] Heikkila, Janne; Silven, Olli, A four-step camera calibration procedure with implicit image correction, (Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (1997), IEEE), 1106-1112
[63] Hsieh, Pei-Lun; Ma, Chongyang; Yu, Jihun; Li, Hao, Unconstrained realtime facial performance capture, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)), 1675-1683
[64] Huang, Haoda; Chai, Jinxiang; Tong, Xin; Wu, Hsiang-Tao, Leveraging Motion Capture and 3D Scanning for High-Fidelity Facial Performance Acquisition, ACM Transactions on Graphics (TOG), vol. 30, 74 (2011), ACM
[65] Huang, Sandy; Papernot, Nicolas; Goodfellow, Ian; Duan, Yan; Abbeel, Pieter, Adversarial Attacks on Neural Network Policies (2017)
[66] Hurley, Niall; Rickard, Scott, Comparing measures of sparsity, IEEE Trans. Inf. Theory, 55, 10, 4723-4741 (2009) · Zbl 1367.94094
[67] Huval, Brody; Wang, Tao; Tandon, Sameep; Kiske, Jeff; Song, Will; Pazhayampallil, Joel; Andriluka, Mykhaylo; Rajpurkar, Pranav; Migimatsu, Toki; Cheng-Yue, Royce; Mujica, Fernando; Coates, Adam; Ng, Andrew Y., An Empirical Evaluation of Deep Learning on Highway Driving (2015)
[68] Ichim, Alexandru Eugen; Bouaziz, Sofien; Pauly, Mark, Dynamic 3d avatar creation from hand-held video input, ACM Trans. Graph., 34, 4, 45 (2015)
[69] Jagtap, Ameya D.; Kawaguchi, Kenji; Karniadakis, George Em, Adaptive activation functions accelerate convergence in deep and physics-informed neural networks, J. Comput. Phys., 404, Article 109136 pp. (2020) · Zbl 1453.68165
[70] Jameson, Antony; Martinelli, Luigi; Pierce, Niles A., Optimum aerodynamic design using the Navier-Stokes equations, Theor. Comput. Fluid Dyn., 10, 213-237 (1998) · Zbl 0912.76067
[71] Jia, Yangqing; Shelhamer, Evan; Donahue, Jeff; Karayev, Sergey; Long, Jonathan; Girshick, Ross; Guadarrama, Sergio; Darrell, Trevor, Caffe: convolutional architecture for fast feature embedding, (Proceedings of the 22nd ACM International Conference on Multimedia. Proceedings of the 22nd ACM International Conference on Multimedia, MM ’14 (2014), Association for Computing Machinery: Association for Computing Machinery New York, NY, USA), 675-678
[72] Kingma, Diederik P.; Ba Adam, Jimmy, A Method for Stochastic Optimization (2014)
[73] Krizhevsky, Alex; Sutskever, Ilya; Hinton, Geoffrey E., Imagenet classification with deep convolutional neural networks, (Pereira, F.; Burges, C. J.C.; Bottou, L.; Weinberger, K. Q., Advances in Neural Information Processing Systems 25 (2012), Curran Associates, Inc.), 1097-1105
[74] Krylov, Igor’ Anatol’yevich; Chernous’ko, Feliks Leonidovich, Solution of problems of optimal control by the method of local variations, USSR Comput. Math. Math. Phys., 6, 2, 12-31 (1966)
[75] Lan, Lana; Cong, Matthew; Fedkiw, Ronald, Lessons from the evolution of an anatomical facial muscle model, (Proceedings of the ACM SIGGRAPH Digital Production Symposium. Proceedings of the ACM SIGGRAPH Digital Production Symposium, DigiPro ’17 (2017), Association for Computing Machinery: Association for Computing Machinery New York, NY, USA)
[76] Lander, Jeff, Skin them bones: game programming for the web generation, Game Dev. Mag., 5, 1, 10-18 (1998)
[77] Larsen, Rasmus, Lanczos bidiagonalization with partial reorthogonalization, DAIMI Rep. Ser., 27, 537 (Dec. 1998)
[78] Le, Quoc V.; Ngiam, Jiquan; Coates, Adam; Lahiri, Ahbik; Prochnow, Bobby; Ng, Andrew Y., On Optimization Methods for Deep Learning (2011)
[79] LeCun, Yann; Boser, Bernhard E.; Denker, John S.; Henderson, Donnie; Howard, Richard E.; Hubbard, Wayne E.; Jackel, Lawrence D., Handwritten digit recognition with a back-propagation network, (Touretzky, D. S., Advances in Neural Information Processing Systems 2 (1990), Morgan-Kaufmann), 396-404
[80] LeCun, Yann; Bottou, Léon; Bengio, Yoshua, Reading checks with multilayer graph transformer networks, (1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1 (1997)), 151-154
[81] LeCun, Yann; Bottou, Léon; Bengio, Yoshua; Haffner, Patrick, Gradient-based learning applied to document recognition, Proc. IEEE, 86, 11, 2278-2324 (1998)
[82] LeCun, Yann; Huang, Fu Jie; Bottou, Léon, Learning methods for generic object recognition with invariance to pose and lighting, (Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004, vol. 2. Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004, vol. 2, CVPR 2004 (2004)), II-104
[83] Levenberg, Kenneth, A method for the solution of certain non-linear problems in least squares, Q. Appl. Math., 2, 2, 164-168 (1944) · Zbl 0063.03501
[84] Lewis, John P.; Anjyo, Ken; Rhee, Taehyun; Zhang, Mengjie; Pighin, Fred; Deng, Zhigang, Practice and theory of blendshape facial models, (Lefebvre, Sylvain; Spagnuolo, Michela, Eurographics 2014 - State of the Art Reports (2014), The Eurographics Association)
[85] Li, Hao; Kadav, Asim; Durdanovic, Igor; Samet, Hanan; Graf, Hans Peter, Pruning Filters for Efficient Convnets (2016)
[86] Li, Hao; Weise, Thibaut; Pauly, Mark, Example-based facial rigging, ACM Trans. Graph., 29, 4, 32 (2010)
[87] Li, Hao; Yu, Jihun; Ye, Yuting; Bregler, Chris, Realtime facial animation with on-the-fly correctives, ACM Trans. Graph., 32, 4, Article 42 pp. (2013) · Zbl 1305.68254
[88] Li, Tzu-Mao; Aittala, Miika; Durand, Frédo; Lehtinen, Jaakko, Differentiable Monte Carlo ray tracing through edge sampling, ACM Trans. Graph., 37, 6 (December 2018)
[89] Ling, Julia; Jones, Reese; Templeton, Jeremy, Machine learning strategies for systems with invariance properties, J. Comput. Phys., 318, 22-35 (2016) · Zbl 1349.76124
[90] Liu, Zhuang; Li, Jianguo; Shen, Zhiqiang; Huang, Gao; Yan, Shoumeng; Zhang, Changshui, Learning efficient convolutional networks through network slimming, (2017 IEEE International Conference on Computer Vision. 2017 IEEE International Conference on Computer Vision, ICCV (2017)), 2755-2763
[91] Loper, Matthew M.; Black, Michael J., Opendr: an approximate differentiable renderer, (European Conference on Computer Vision (2014), Springer), 154-169
[92] Lou, Yifei; Zhang, Xiaoqun; Osher, Stanley; Bertozzi, Andrea, Image recovery via nonlocal operators, J. Sci. Comput., 42, 185-197 (2010) · Zbl 1203.65088
[93] Louizos, Christos; Ullrich, Karen; Welling, Max, Bayesian compression for deep learning, (Guyon, I.; Luxburg, U. V.; Bengio, S.; Wallach, H.; Fergus, R.; Vishwanathan, S.; Garnett, R., Advances in Neural Information Processing Systems 30 (2017), Curran Associates, Inc.), 3288-3298
[94] Lourakis, Manolis I. A.; Argyros, Antonis A., Is Levenberg-Marquardt the most efficient optimization algorithm for implementing bundle adjustment?, (Tenth IEEE International Conference on Computer Vision. Tenth IEEE International Conference on Computer Vision, ICCV’05, vol. 2 (2005), IEEE), 1526-1531
[95] Ma, Rongrong; Miao, Jianyu; Niu, Lingfeng; Zhang, Peng, Transformed \(\ell_1\) regularization for learning sparse deep neural networks, Neural Netw., 119, 286-298 (2019) · Zbl 1434.68512
[96] Magnenat-Thalmann, Nadia; Laperrière, Richard; Thalmann, Daniel, Joint-dependent local deformations for hand animation and object grasping, (Proceedings on Graphics Interface ’88 (1989), Canadian Information Processing Society: Canadian Information Processing Society CAN), 26-33
[97] Marquardt, Donald W., An algorithm for least-squares estimation of nonlinear parameters, J. Soc. Ind. Appl. Math., 11, 2, 431-441 (1963) · Zbl 0112.10505
[98] McCulloch, Warren S.; Pitts, Walter, A logical calculus of the ideas immanent in nervous activity, Bull. Math. Biophys., 5, 115-133 (1943) · Zbl 0063.03860
[99] McFee, Brian; Lanckriet, Gert, Metric learning to rank, (Proceedings of the 27th International Conference on International Conference on Machine Learning. Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML ’10, Madison, WI, USA (2010), Omnipress), 775-782
[100] Mohamed, Abdel-Rahman; Dahl, George E.; Hinton, Geoffrey, Acoustic modeling using deep belief networks, IEEE Trans. Audio Speech Lang. Process., 20, 1, 14-22 (2012)
[101] Nash, Stephen G., A survey of truncated-Newton methods, J. Comput. Appl. Math., 124, 1-2, 45-59 (2000) · Zbl 0969.65054
[102] Neklyudov, Kirill; Molchanov, Dmitry; Ashukha, Arsenii; Vetrov, Dmitry P., Structured Bayesian pruning via log-normal multiplicative noise, (Guyon, I.; Luxburg, U. V.; Bengio, S.; Wallach, H.; Fergus, R.; Vishwanathan, S.; Garnett, R., Advances in Neural Information Processing Systems 30 (2017), Curran Associates, Inc.), 6775-6784
[103] Nesterov, Yurii, A method of solving a convex programming problem with convergence rate \(O(1 / k^2)\), Sov. Math. Dokl., 27, 372-376 (1983) · Zbl 0535.90071
[104] Nesterov, Yurii, Efficiency of coordinate descent methods on huge-scale optimization problems, SIAM J. Optim., 22, 2, 341-362 (2012) · Zbl 1257.90073
[105] Neumann, Thomas; Varanasi, Kiran; Wenger, Stephan; Wacker, Markus; Magnor, Marcus; Theobalt, Christian, Sparse localized deformation components, ACM Trans. Graph., 32, 6, 179 (2013)
[106] Nocedal, Jorge, Updating quasi-Newton matrices with limited storage, Math. Comput., 35, 151, 773-782 (1980) · Zbl 0464.65037
[107] Nocedal, Jorge; Wright, Stephen J., Numerical Optimization (2006), Springer · Zbl 1104.65059
[108] Nutini, Julie; Schmidt, Mark; Laradji, Issam; Friedlander, Michael; Koepke, Hoyt, Coordinate descent converges faster with the Gauss-Southwell rule than random selection, (International Conference on Machine Learning (2015)), 1632-1641
[109] Park, Jongsoo; Li, Sheng; Wen, Wei; Tang, Ping Tak Peter; Li, Hai; Chen, Yiran; Dubey, Pradeep, Faster CNNs with Direct Sparse Convolutions and Guided Pruning (2016)
[110] Paszke, Adam; Gross, Sam; Massa, Francisco; Lerer, Adam; Bradbury, James; Chanan, Gregory; Killeen, Trevor; Lin, Zeming; Gimelshein, Natalia; Antiga, Luca; Desmaison, Alban; Kopf, Andreas; Yang, Edward; DeVito, Zachary; Raison, Martin; Tejani, Alykhan; Chilamkurthy, Sasank; Steiner, Benoit; Fang, Lu; Bai, Junjie; Chintala, Soumith, PyTorch: an imperative style, high-performance deep learning library, (Wallach, H.; Larochelle, H.; Beygelzimer, A.; Buc, F. d’Alché; Fox, E.; Garnett, R., Advances in Neural Information Processing Systems 32 (2019), Curran Associates, Inc.), 8024-8035
[111] Platt, John C., Sequential minimal optimization: a fast algorithm for training support vector machines (April 1998), Microsoft Research, Technical Report MSR-TR-98-14
[112] Powell, Michael J. D., A hybrid method for nonlinear equations, (Rabinowitz, Philip, Numerical Methods for Nonlinear Algebraic Equations (1970), Gordon and Breach), 87-114 · Zbl 0277.65028
[113] Qi, Yinghe; Lu, Jiacai; Scardovelli, Ruben; Zaleski, Stéphane; Tryggvason, Grétar, Computing curvature for volume of fluid methods using machine learning, J. Comput. Phys., 377, 155-161 (2019)
[114] Qian, Ning, On the momentum term in gradient descent learning algorithms, Neural Netw., 12, 1, 145-151 (1999)
[115] Raissi, Maziar; Perdikaris, Paris; Karniadakis, George Em, Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, J. Comput. Phys., 378, 686-707 (2019) · Zbl 1415.68175
[116] Rubner, Yossi; Tomasi, Carlo; Guibas, Leonidas J., A metric for distributions with applications to image databases, (Sixth International Conference on Computer Vision (IEEE Cat. No. 98CH36271) (1998)), 59-66
[117] Ruder, Sebastian, An Overview of Gradient Descent Optimization Algorithms (2016)
[118] Salama, Moktar A.; Garba, John A.; Demsetz, Laura A.; Udwadia, Firdaus E., Simultaneous optimization of controlled structures, Comput. Mech., 3, 275-282 (1988) · Zbl 0635.73097
[119] Shanno, David F., Conditioning of quasi-Newton methods for function minimization, Math. Comput., 24, 111, 647-656 (1970) · Zbl 0225.65073
[120] Shi, Hao-Jun Michael; Tu, Shenyinying; Xu, Yangyang; Yin, Wotao, A Primer on Coordinate Descent Algorithms (2016)
[121] Sifakis, Eftychios; Neverov, Igor; Fedkiw, Ronald, Automatic determination of facial muscle activations from sparse motion capture marker data, (ACM SIGGRAPH 2005 Papers. ACM SIGGRAPH 2005 Papers, SIGGRAPH ’05, New York, NY, USA (2005), Association for Computing Machinery), 417-425
[122] Sifakis, Eftychios; Selle, Andrew; Robinson-Mosher, Avram; Fedkiw, Ronald, Simulating speech with a physics-based facial muscle model, (Proceedings of the 2006 ACM SIGGRAPH/Eurographics Symposium on Computer Animation. Proceedings of the 2006 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, SCA ’06, Goslar, DEU (2006), Eurographics Association), 261-270
[123] Sirignano, Justin; Spiliopoulos, Konstantinos, DGM: a deep learning algorithm for solving partial differential equations, J. Comput. Phys., 375, 1339-1364 (2018) · Zbl 1416.65394
[124] Sorensen, Danny C., Newton’s method with a model trust region modification, SIAM J. Numer. Anal., 19, 2, 409-426 (1982) · Zbl 0483.65039
[125] Su, Jiawei; Vargas, Danilo Vasconcellos; Sakurai, Kouichi, One pixel attack for fooling deep neural networks, IEEE Trans. Evol. Comput., 23, 5, 828-841 (2019)
[126] Sutskever, Ilya; Martens, James; Dahl, George; Hinton, Geoffrey, On the importance of initialization and momentum in deep learning, (International Conference on Machine Learning (2013)), 1139-1147
[127] Theano: a Python framework for fast computation of mathematical expressions (May 2016), arXiv e-prints
[128] Thies, Justus; Zollhofer, Michael; Stamminger, Marc; Theobalt, Christian; Nießner, Matthias, Face2Face: real-time face capture and reenactment of RGB videos, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)), 2387-2395
[129] Tieleman, Tijmen; Hinton, Geoffrey, Lecture 6.5: rmsprop: divide the gradient by a running average of its recent magnitude, Coursera: Neural Netw. Mach. Learn., 4, 2, 26-31 (2012)
[130] Tseng, Paul, Convergence of a block coordinate descent method for nondifferentiable minimization, J. Optim. Theory Appl., 109, 3, 475-494 (2001) · Zbl 1006.65062
[131] Vauhkonen, Marko; Vadász, Dénes; Karjalainen, Pasi A.; Somersalo, Erkki; Kaipio, Jari P., Tikhonov regularization and prior information in electrical impedance tomography, IEEE Trans. Med. Imaging, 17, 2, 285-293 (1998)
[132] Wang, Min; Wun Cheung, Siu; Tat Leung, Wing; Chung, Eric T.; Efendiev, Yalchin; Wheeler, Mary, Reduced-order deep learning for flow dynamics. The interplay between deep learning and model reduction, J. Comput. Phys., 401, Article 108939 pp. (2020) · Zbl 1454.76007
[133] Wen, Wei; Chen, Yiran; Li, Hai; He, Yuxiong; Rajbhandari, Samyam; Zhang, Minjia; Wang, Wenhan; Liu, Fang; Hu, Bin, Learning intrinsic sparse structures within long short-term memory, (ICLR 2018 Conference (February 2018))
[134] Wen, Wei; Wu, Chunpeng; Wang, Yandan; Chen, Yiran; Li, Hai, Learning structured sparsity in deep neural networks, (Lee, D. D.; Sugiyama, M.; Luxburg, U. V.; Guyon, I.; Garnett, R., Advances in Neural Information Processing Systems 29 (2016), Curran Associates, Inc.), 2074-2082
[135] Wright, Stephen J., Coordinate descent algorithms, Math. Program., 151, 1, 3-34 (2015) · Zbl 1317.49038
[136] Wu, Chenglei; Bradley, Derek; Gross, Markus; Beeler, Thabo, An anatomically-constrained local deformation model for monocular face capture, ACM Trans. Graph., 35, 4, 115 (2016)
[137] Yang, Huanrui; Wen, Wei; Deephoyer, Hai Li., Learning sparser neural network with differentiable scale-invariant sparsity measures, (International Conference on Learning Representations (2020))
[138] Yun, Jihun; Zheng, Peng; Yang, Eunho; Lozano, Aurelie; Aravkin, Aleksandr, Trimming the \(\ell_1\) regularizer: statistical analysis, optimization, and applications to deep learning, (Chaudhuri, Kamalika; Salakhutdinov, Ruslan, Proceedings of the 36th International Conference on Machine Learning. Proceedings of the 36th International Conference on Machine Learning, Long Beach, California, USA. Proceedings of the 36th International Conference on Machine Learning. Proceedings of the 36th International Conference on Machine Learning, Long Beach, California, USA, Proceedings of Machine Learning Research, vol. 97 (Jun 2019)), 7242-7251, PMLR
[139] Zeiler, Matthew D., ADADELTA: An Adaptive Learning Rate Method (2012)
[140] Zhang, Xiaoqun; Burger, Martin; Bresson, Xavier; Osher, Stanley, Bregmanized nonlocal regularization for deconvolution and sparse reconstruction, SIAM J. Imaging Sci., 3, 3, 253-276 (2010) · Zbl 1191.94030
[141] Zienkiewicz, Olgierd Cecil; Taylor, Robert Leroy, The Finite Element Method, Volume 1: The Basis, vol. 1 (2000), Butterworth-Heinemann · Zbl 0974.76003
[142] Zienkiewicz, Olgierd Cecil; Taylor, Robert Leroy, The Finite Element Method, Volume 2: Solid Mechanics, vol. 2 (2000), Butterworth-Heinemann · Zbl 0974.76004
[143] Zoss, Gaspard; Bradley, Derek; Bérard, Pascal; Beeler, Thabo, An empirical rig for jaw animation, ACM Trans. Graph., 37, 4, 1-12 (2018)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.