×

Stochastic approximation: from statistical origin to big-data, multidisciplinary applications. (English) Zbl 07368238

Summary: Stochastic approximation was introduced in 1951 to provide a new theoretical framework for root finding and optimization of a regression function in the then-nascent field of statistics. This review shows how it has evolved in response to other developments in statistics, notably time series and sequential analysis, and to applications in artificial intelligence, economics and engineering. Its resurgence in the big data era has led to new advances in both theory and applications of this microcosm of statistics and data science.

MSC:

62-XX Statistics
Full Text: DOI

References:

[1] Anderson, T. W. and Taylor, J. B. (1976). Some experimental results on the statistical properties of least squares estimates in control problems. Econometrica 44 1289-1302.
[2] Åstöm, K. J. and Wittenmark, B. (1973). On self-tuning regulators. Automatica 9 189-199. · Zbl 0249.93049
[3] Ȧström, K. J. and Wittenmark, B. (1971). Problems of identification and control. J. Math. Anal. Appl. 34 90-113. · Zbl 0217.57903 · doi:10.1016/0022-247X(71)90161-2
[4] Basu, S. and Michailidis, G. (2015). Regularized estimation in sparse high-dimensional time series models. Ann. Statist. 43 1535-1567. · Zbl 1317.62067 · doi:10.1214/15-AOS1315
[5] Blum, J. R. (1954). Approximation methods which converge with probability one. Ann. Math. Stat. 25 382-386. · Zbl 0055.37806 · doi:10.1214/aoms/1177728794
[6] Bonyadi, M. and Michalewicz, Z. (2016). Analysis of stability, local convergence, and transformation sensitivity of a variant of particle swarm optimization algorithm. IEEE Trans. Evol. Comput. 20 370-385.
[7] Box, G. E. P. and Wilson, K. B. (1951). On the experimental attainment of optimum conditions. J. Roy. Statist. Soc. Ser. B 13 1-38; discussion: 38-45. · Zbl 0043.34402
[8] Bühlmann, P. (2006). Boosting for high-dimensional linear models. Ann. Statist. 34 559-583. · Zbl 1095.62077 · doi:10.1214/009053606000000092
[9] Choi, K. P., Lai, T. L., Tong, X. T. and Wong, W. K. (2021). A statistical approach to adaptive parameter tuning in nature-inspired optimization and optimal sequential design of dose-finding trials. Statist. Sinica 31. To appear. · Zbl 1524.62524
[10] Clerc, M. (2006). Particle Swarm Optimization. ISTE, London. · Zbl 1130.90059 · doi:10.1002/9780470612163
[11] de la Peña, V. H., Klass, M. J. and Lai, T. L. (2009). Theory and applications of multivariate self-normalized processes. Stochastic Process. Appl. 119 4210-4227. · Zbl 1190.60017 · doi:10.1016/j.spa.2009.10.003
[12] Efron, B. (2003). Robbins, empirical Bayes and microarrays. Dedicated to the memory of Herbert E. Robbins. Ann. Statist. 31 366-378. · Zbl 1038.62099 · doi:10.1214/aos/1051027871
[13] Freund, Y. and Schapire, R. (1997). A decision-theoretic generalization of online learning and an application to boosting. J. Comput. System Sci. 55 119-139. · Zbl 0880.68103
[14] Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Ann. Statist. 29 1189-1232. · Zbl 1043.62034 · doi:10.1214/aos/1013203451
[15] Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting. Ann. Statist. 28 337-407. With discussion and a rejoinder by the authors. · Zbl 1106.62323 · doi:10.1214/aos/1016218223
[16] Fuchs, J.-J. J. (1982). Indirect stochastic adaptive control: The general delay-white noise case. IEEE Trans. Automat. Control 27 219-223. · Zbl 0471.93068 · doi:10.1109/TAC.1982.1102827
[17] Goodwin, G. C., Ramadge, P. J. and Caines, P. E. (1981). Discrete time stochastic adaptive control. SIAM J. Control Optim. 19 829-853. · Zbl 0473.93075 · doi:10.1137/0319052
[18] Guo, L., Huang, D. W. and Hannan, E. J. (1990). On \[\text{ARX}(\infty )\] approximation. J. Multivariate Anal. 32 17-47. · Zbl 0688.62051 · doi:10.1016/0047-259X(90)90069-T
[19] Hannan, E. J. (1987). Rational transfer function approximation. Statist. Sci. 2 135-161. With comments and a reply by the author. · Zbl 0638.62091
[20] Hassabis, D., Kumaran, D., Summerfield, C. and Botvinick, M. (2017). Neuroscience-inspired artificial intelligence. Neuron 95 245-258. · doi:10.1016/j.neuron.2017.06.011
[21] Hotelling, H. (1941). Experimental determination of the maximum of a function. Ann. Math. Stat. 12 20-45. · Zbl 0024.43104 · doi:10.1214/aoms/1177731784
[22] Ing, C.-K. and Lai, T. L. (2011). A stepwise regression method and consistent model selection for high-dimensional sparse linear models. Statist. Sinica 21 1473-1513. · Zbl 1225.62095 · doi:10.5705/ss.2010.081
[23] Kennedy, J. and Eberhart, R. (1995). Particle swarm optimization. In Proc. IEEE Int. Conf. Neural Networks 4 1942-1048, Perth, Australia.
[24] Kiefer, J. (1953). Sequential minimax search for a maximum. Proc. Amer. Math. Soc. 4 502-506. · Zbl 0050.35702 · doi:10.2307/2032161
[25] Kiefer, J. (1957). Optimum sequential search and approximation methods under minimum regularity assumptions. J. Soc. Indust. Appl. Math. 5 105-136. · Zbl 0081.35801
[26] Kiefer, J. and Wolfowitz, J. (1952). Stochastic estimation of the maximum of a regression function. Ann. Math. Stat. 23 462-466. · Zbl 0049.36601 · doi:10.1214/aoms/1177729392
[27] Kumar, P. R. and Varaiya, P. (1986). Stochastic Systems, 1st ed. ed. Prentice-Hall, Englewood Cliffs, NJ. · Zbl 0706.93057
[28] Kushner, H. J. and Clark, D. S. (1978). Stochastic Approximation Methods for Constrained and Unconstrained Systems. Applied Mathematical Sciences 26. Springer, New York-Berlin. · Zbl 0381.60004
[29] Lai, T. L. (1987). Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist. 15 1091-1114. · Zbl 0643.62054 · doi:10.1214/aos/1176350495
[30] Lai, T. L. (1989). Extended stochastic Lyapunov functions and recursive algorithms in linear stochastic systems. In Stochastic Differential Systems (Bad Honnef, 1988) (N. Christopeit, K. Helmes and N. Kohlmann, eds.). Lect. Notes Control Inf. Sci. 126 206-220. Springer, Berlin. · Zbl 0696.93070 · doi:10.1007/BFb0043786
[31] Lai, T. L. (2003). Stochastic approximation. Dedicated to the memory of Herbert E. Robbins. Ann. Statist. 31 391-406. · Zbl 1039.62077 · doi:10.1214/aos/1051027873
[32] Lai, T. L., Choi, A. and Tsang, K. W. (2019). Statistical science in information technology and precision medicine. Ann. Math. Sci. Appl. 4 413-438. · Zbl 1432.62011 · doi:10.4310/AMSA.2019.v4.n2.a6
[33] Lai, T. L. and Robbins, H. (1979). Adaptive design and stochastic approximation. Ann. Statist. 7 1196-1221. · Zbl 0426.62059
[34] Lai, T. L. and Robbins, H. (1981). Consistency and asymptotic efficiency of slope estimates in stochastic approximation schemes. Z. Wahrsch. Verw. Gebiete 56 329-360. · Zbl 0472.62089 · doi:10.1007/BF00536178
[35] Lai, T. L. and Robbins, H. (1982a). Adaptive design and the multiperiod control problem. In Statistical Decision Theory and Related Topics, III, Vol. 2 (West Lafayette, Ind., 1981) 103-120. Academic Press, New York. · Zbl 0573.62058
[36] Lai, T. L. and Robbins, H. (1982b). Iterated least squares in multiperiod control. Adv. in Appl. Math. 3 50-73. · Zbl 0489.62073 · doi:10.1016/S0196-8858(82)80005-5
[37] Lai, T. L. and Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Adv. in Appl. Math. 6 4-22. · Zbl 0568.62074 · doi:10.1016/0196-8858(85)90002-8
[38] Lai, T. L. and Siegmund, D. (1986). The contributions of Herbert Robbins to mathematical statistics. Statist. Sci. 1 276-284. · Zbl 0606.62003
[39] Lai, T. L. and Wei, C. Z. (1982). Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems. Ann. Statist. 10 154-166. · Zbl 0649.62060
[40] Lai, T. L. and Wei, C. Z. (1986). Extended least squares and their applications to adaptive control and prediction in linear systems. IEEE Trans. Automat. Control 31 898-906. · Zbl 0603.93060 · doi:10.1109/TAC.1986.1104138
[41] Lai, T. L., Xia, T. and Yuan, H. (2020). Modified gradient boosting in high-dimensional nonlinear regression. Technical report, Dept. Statistics, Stanford Univ.
[42] Lai, T. L., Xu, H. and Yuan, H. (2020). Self-normalized martingales, recursive orthogonal matching pursuit and modified gradient boosting in high-dimensional stochastic regression models. Technical report, Dept. Statistics, Stanford Univ.
[43] Lai, T. L. and Ying, Z. (1991a). Recursive identification and adaptive prediction in linear stochastic systems. SIAM J. Control Optim. 29 1061-1090. · Zbl 0756.93081 · doi:10.1137/0329058
[44] Lai, T. L. and Ying, Z. (1991b). Parallel recursive algorithms in asymptotically efficient adaptive control of linear stochastic systems. SIAM J. Control Optim. 29 1091-1127. · Zbl 0746.93078 · doi:10.1137/0329059
[45] Ljung, L. (1977). Analysis of recursive stochastic algorithms. IEEE Trans. Automat. Control AC-22 551-575. · Zbl 0362.93031 · doi:10.1109/tac.1977.1101561
[46] Luckett, D. J., Laber, E. B., Kahkoska, A. R., Maahs, D. M., Mayer-Davis, E. and Kosorok, M. R. (2020). Estimating dynamic treatment regimes in mobile health using V-learning. J. Amer. Statist. Assoc. 115 692-706. · Zbl 1445.62279 · doi:10.1080/01621459.2018.1537919
[47] Mallat, S. and Zhang, Z. (1993). Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 41 3397-3415. · Zbl 0842.94004
[48] Nemirovski, A. and Yudin, D. (1978). On Cezari’s convergence of the steepest descent method for approximating saddle points of convex-concave functions. Sov. Math., Dokl. 19. · Zbl 0413.90061
[49] Nemirovsky, A. S. and Yudin, D. B. (1983). Problem Complexity and Method Efficiency in Optimization. A Wiley-Interscience Publication. Wiley, New York. · Zbl 0501.90062
[50] Page, W. (1984). An interview with Herbert Robbins. College Math. J. 15 2-24. · Zbl 0995.01513 · doi:10.2307/3027425
[51] Peligrad, M., Sang, H., Zhong, Y. and Wu, W. B. (2014). Exact moderate and large deviations for linear processes. Statist. Sinica 24 957-969. · Zbl 1285.62110
[52] Polyak, B. T. (1990). A new method of stochastic approximation type. Avtomat. i Telemekh. 7 98-107.
[53] Ravikumar, P., Wainwright, M. J., Raskutti, G. and Yu, B. (2011). High-dimensional covariance estimation by minimizing \[{\ell_1}\]-penalized log-determinant divergence. Electron. J. Stat. 5 935-980. · Zbl 1274.62190 · doi:10.1214/11-EJS631
[54] Robbins, H. E. (1944). Two properties of the function \[\cos x\]. Bull. Amer. Math. Soc. 50 750-752. · Zbl 0060.19107 · doi:10.1090/S0002-9904-1944-08232-3
[55] Robbins, H. E. (1945). On the measure of a random set. II. Ann. Math. Stat. 16 342-347. · Zbl 0060.29406 · doi:10.1214/aoms/1177731060
[56] Robbins, H. (1952). Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58 527-535. · Zbl 0049.37009 · doi:10.1090/S0002-9904-1952-09620-8
[57] Robbins, H. and Monro, S. (1951). A stochastic approximation method. Ann. Math. Stat. 22 400-407. · Zbl 0054.05901 · doi:10.1214/aoms/1177729586
[58] Robbins, H. and Siegmund, D. (1971). A convergence theorem for non negative almost supermartingales and some applications. In Optimizing Methods in Statistics (Proc. Sympos., Ohio State Univ., Columbus, Ohio, 1971) 233-257. · Zbl 0286.60025
[59] Ruppert, D. (1985). A Newton-Raphson version of the multivariate Robbins-Monro procedure. Ann. Statist. 13 236-245. · Zbl 0571.62072 · doi:10.1214/aos/1176346589
[60] Ruppert, D. (1991). Stochastic approximation. In Handbook of Sequential Analysis (B. K. Ghosh and P. K. Sen, eds.). Statist. Textbooks Monogr. 118 503-529. Dekker, New York. · Zbl 0753.62046
[61] Sakrison, D. J. (1967). The use of stochastic approximation to solve the system identification problem. IEEE Trans. Automat. Control 12 563-567.
[62] Saridis, G. and Stein, G. (1968). A new algorithm for linear system identification. IEEE Trans. Automat. Control 13 592-594.
[63] Siegmund, D. (2003). Herbert Robbins and sequential analysis. Dedicated to the memory of Herbert E. Robbins. Ann. Statist. 31 349-365. · Zbl 1039.62075 · doi:10.1214/aos/1051027870
[64] Spall, J. C. (1992). Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Trans. Automat. Control 37 332-341. · Zbl 0745.60110 · doi:10.1109/9.119632
[65] Spall, J. C. (2000). Adaptive stochastic approximation by the simultaneous perturbation method. IEEE Trans. Automat. Control 45 1839-1853. · Zbl 0990.93125 · doi:10.1109/TAC.2000.880982
[66] Spall, J. C. (2003). Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control. Wiley-Interscience Series in Discrete Mathematics and Optimization. Wiley, Hoboken, NJ. · Zbl 1088.90002 · doi:10.1002/0471722138
[67] Temlyakov, V. N. (2000). Weak greedy algorithms. Adv. Comput. Math. 12 213-227. · Zbl 0964.65009 · doi:10.1023/A:1018917218956
[68] Tewari, A. and Murphy, S. A. (2017). From ads to interventions: Contextual bandits in mobile health. In Mobile Health (J. Re, S. A. Murphy and S. Kumar, eds.) 495-517. Springer, Berlin.
[69] Tomkins, S., Liao, P., Yeung, S., Klasnja, P. and Murphy, S. A. (2019). Intelligent pooling in Thompson sampling for rapid personalization in mobile health. ICML 2019 Workshop RL4RealLife, 2019.
[70] Wei, C. Z. (1987). Multivariate adaptive stochastic approximation. Ann. Statist. 15 1115-1130. · Zbl 0676.62064 · doi:10.1214/aos/1176350496
[71] Widrow, B. and Hoff, M. E. (1960). Adaptive switching circuits. In Proc. IRE WESCON Convention Record, Part 4 96-104.
[72] Wu, W.-B. and Wu, Y. N. (2016). Performance bounds for parameter estimates of high-dimensional linear models with correlated errors. Electron. J. Stat. 10 352-379. · Zbl 1333.62172 · doi:10.1214/16-EJS1108
[73] Yuan, Q. and Yin, G. (2015). Analyzing convergence and rates of convergence of particle swarm optimization algorithms using stochastic approximation methods. IEEE Trans. Automat. Control 60 1760-1773. · Zbl 1360.90312 · doi:10.1109/TAC.2014.2381454
[74] Zhang, C.-H. (2003). Compound decision theory and empirical Bayes methods. Dedicated to the memory of Herbert E. Robbins. Ann. Statist. 31 379-390 · Zbl 1039.62005
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.