×

Efficient covariance matrix update for variable metric evolution strategies. (English) Zbl 1470.68183

Summary: Randomized direct search algorithms for continuous domains, such as evolution strategies, are basic tools in machine learning. They are especially needed when the gradient of an objective function (e.g., loss, energy, or reward function) cannot be computed or estimated efficiently. Application areas include supervised and reinforcement learning as well as model selection. These randomized search strategies often rely on normally distributed additive variations of candidate solutions. In order to efficiently search in non-separable and ill-conditioned landscapes the covariance matrix of the normal distribution must be adapted, amounting to a variable metric method. Consequently, covariance matrix adaptation (CMA) is considered state-of-the-art in evolution strategies. In order to sample the normal distribution, the adapted covariance matrix needs to be decomposed, requiring in general \(\Theta (n ^{3})\) operations, where \(n\) is the search space dimension. We propose a new update mechanism which can replace a rank-one covariance matrix update and the computationally expensive decomposition of the covariance matrix. The newly developed update rule reduces the computational complexity of the rank-one covariance matrix adaptation to \(\Theta (n ^{2})\) without resorting to outdated distributions. We derive new versions of the elitist covariance matrix adaptation evolution strategy (CMA-ES) and the multi-objective CMA-ES. These algorithms are equivalent to the original procedures except that the update step for the variable metric distribution scales better in the problem dimension. We also introduce a simplified variant of the non-elitist CMA-ES with the incremental covariance matrix update and investigate its performance. Apart from the reduced time-complexity of the distribution update, the algebraic computations involved in all new algorithms are simpler compared to the original versions. The new update rule improves the performance of the CMA-ES for large scale machine learning problems in which the objective function can be evaluated fast.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62J10 Analysis of variance and covariance (ANOVA)
68T20 Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.)
68W50 Evolutionary algorithms, genetic algorithms (computational aspects)
90C29 Multi-objective and goal programming
90C59 Approximation methods and heuristics in mathematical programming
Full Text: DOI

References:

[1] Arnold, D. V. (2002). Noisy optimization with evolution strategies. Dordrecht: Kluwer Academic. · Zbl 1103.90005
[2] Arnold, D. V. (2006). Weighted multirecombination evolution strategies. Theoretical Computer Science, 361(1), 18–37. · Zbl 1097.68029 · doi:10.1016/j.tcs.2006.04.003
[3] Arnold, D. V., & Beyer, H. G. (2003). A comparison of evolution strategies with other direct search methods in the presence of noise. Computational Optimization and Applications, 24(1), 135–159. · Zbl 1035.90110 · doi:10.1023/A:1021810301763
[4] Auger, A. (2005). Convergence results for the (1, {\(\lambda\)})-SA-ES using the theory of {\(\phi\)}-irreducible Markov chains. Theoretical Computer Science, 334(1), 35–69. · Zbl 1080.68113 · doi:10.1016/j.tcs.2004.11.017
[5] Auger, A., & Hansen, N. (2005a). Performance evaluation of an advanced local search evolutionary algorithm. In Proceedings of the IEEE congress on evolutionary computation (CEC 2005) (pp. 1777–1784). New York: IEEE Press.
[6] Auger, A., & Hansen, N. (2005b). A restart CMA evolution strategy with increasing population size. In Proceedings of the IEEE congress on evolutionary computation (CEC 2005) (pp. 1769–1776). New York: IEEE Press.
[7] Beume, N., & Rudolph, G. (2006). Faster S-metric calculation by considering dominated hypervolume as Klee’s measure problem. In IASTED international conference on computational intelligence (pp. 231–236). Calgary: ACTA Press.
[8] Beume, N., Naujoks, B., & Emmerich, M. (2007). SMS-EMOA: Multiobjective selection based on dominated hypervolume. European Journal of Operational Research, 181(3), 1653–1669. · Zbl 1123.90064 · doi:10.1016/j.ejor.2006.08.008
[9] Beyer, H. G. (2001). The theory of evolution strategies. New York: Springer. · Zbl 1001.68186
[10] Beyer, H. G. (2007). Evolution strategies. Scholarpedia, 2(8), 1965. · doi:10.4249/scholarpedia.1965
[11] Beyer, H. G., & Schwefel, H. P. (2002). Evolution strategies: A comprehensive introduction. Natural Computing, 1(1), 3–52. · Zbl 1014.68134 · doi:10.1023/A:1015059928466
[12] Deb, K. (2001). Multi-objective optimization using evolutionary algorithms. New York: Wiley. · Zbl 0970.90091
[13] Deb, K., Agrawal, S., Pratap, A., & Meyarivan, T. (2002). A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2), 182–197. · doi:10.1109/4235.996017
[14] Emmerich, M., Beume, N., & Naujoks, B. (2005). An EMO algorithm using the hypervolume measure as selection criterion. In C. A. Coello Coello, E. Zitzler, & A. Hernandez Aguirre (Eds.), LNCS: Vol. 3410. Third international conference on evolutionary multi-criterion optimization (EMO 2005) (pp. 62–76). New York: Springer. · Zbl 1109.68595
[15] Fogel, D. B. (1995). Evolutionary computation: Toward a new philosophy of machine intelligence. New York: IEEE Press. · Zbl 0926.68052
[16] Fonseca, C. M., Paquete, L., & López-Ibáñez, M. (2006). An improved dimension-sweep algorithm for the hypervolume indicator. In: IEEE congress on evolutionary computation (pp. 1157–1163).
[17] Friedrichs, F., & Igel, C. (2005). Evolutionary tuning of multiple SVM parameters. Neurocomputing, 64(C), 107–117. · doi:10.1016/j.neucom.2004.11.022
[18] Glasmachers, T., & Igel, C. (2008). Uncertainty handling in model selection for support vector machines. In G. Rudolph (Ed.), LNCS: Vol. 5199. Parallel problem solving from nature (PPSN X) (pp. 185–194). New York: Springer.
[19] Gomez, F., Schmidhuber, J., & Miikkulainen, R. (2006). Efficient non-linear control through neuroevolution. LNCS: Vol. 4212. In Proceedings of the European conference on machine learning (ECML 2006) (pp. 654–662). New York: Springer.
[20] Grewal, M. S., & Andrews, A. P. (1993). Kalman filtering: Theory and practice. Englewood Cliffs: Prentice-Hall. · Zbl 0848.93002
[21] Hager, W. W. (1989). Updating the inverse of a matrix. SIAM Review, 31(2), 221–239. · Zbl 0671.65018 · doi:10.1137/1031049
[22] Hansen, N. (1998). Verallgemeinerte individuelle Schrittweitenregelung in der Evolutionsstrategie. Eine Untersuchung zur entstochastisierten, koordinatensystemunabhängigen Adaptation der Mutationsverteilung. Berlin: Mensch und Buch Verlag. ISBN 3-933346-29-0.
[23] Hansen, N. (2006). The CMA evolution strategy: a comparing review. In J. Lozano, P. Larranaga, I. Inza, & E. Bengoetxea (Eds.), Towards a new evolutionary computation. Advances on estimation of distribution algorithms (pp. 75–102). New York: Springer.
[24] Hansen, N. (2008). http://www.bionik.tu-berlin.de/user/niko/cmaapplications.pdf .
[25] Hansen, N., & Kern, S. (2004). Evaluating the CMA evolution strategy on multimodal test functions. In X. Yao, E. Burke, J. A. Lozano, J. Smith, J. J. Merelo-Guervós, J. A. Bullinaria, J. Rowe, P. Tiňo, A. Kabán, & H. P. Schwefel (Eds.), LNCS: Vol. 3242. Parallel problem solving from nature (PPSN VIII) (pp. 282–291). New York: Springer.
[26] Hansen, N., & Ostermeier, A. (1996). Adapting arbitrary normal mutation distributions in evolution strategies: The covariance matrix adaptation. In Proceedings of the 1996 IEEE conference on evolutionary computation (ICEC ’96) (pp. 312–317).
[27] Hansen, N., & Ostermeier, A. (2001). Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation, 9(2), 159–195. · doi:10.1162/106365601750190398
[28] Hansen, N., Ostermeier, A., & Gawelczyk, A. (1995). On the adaptation of arbitrary normal mutation distributions in evolution strategies: The generating set adaptation. In L. Eshelman (Ed.), Proceedings of the sixth international conference on genetic algorithms (pp. 57–64). San Francisco: Morgan Kaufmann.
[29] Hansen, N., Müller, S. D., & Koumoutsakos, P. (2003). Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evolutionary Computation, 11(1), 1–18. · doi:10.1162/106365603321828970
[30] Hansen, N., Niederberger, S. P. N., Guzzella, L., & Koumoutsakos, P. (2008). A method for handling uncertainty in evolutionary optimization with an application to feedback control of combustion. IEEE Transactions on Evolutionary Computation. doi: 10.1109/TEVC.2008.924423 .
[31] Heidrich-Meisner, V., & Igel, C. (2008a). Evolution strategies for direct policy search. In G. Rudolph (Ed.), LNCS: Vol. 5199. Parallel problem solving from nature (PPSN X) (pp. 428–437). New York: Springer.
[32] Heidrich-Meisner, V., & Igel, C. (2008b). Similarities and differences between policy gradient methods and evolution strategies. In: M. Verleysen (Ed.), 16th European symposium on artificial neural networks (ESANN 2008), Evere, Belgien: d-side publications (pp. 427–432).
[33] Heidrich-Meisner, V., & Igel, C. (2008c). Variable metric reinforcement learning methods applied to the noisy mountain car problem. In S. Girgin et al. (Eds.), LNAI: Vol. 5323. European workshop on reinforcement learning (EWRL 2008) (pp. 136–150). New York: Springer.
[34] Igel, C. (2003). Neuroevolution for reinforcement learning using evolution strategies. In R. Sarker, R. Reynolds, H. Abbass, K. C. Tan, B. McKay, D. Essam, & T. Gedeon (Eds.), Vol. 4. Congress on evolutionary computation (CEC 2003) (pp. 2588–2595). New York: IEEE Press.
[35] Igel, C. (2005). Multi-objective model selection for support vector machines. In C. A. Coello Coello, E. Zitzler, & A. Hernandez Aguirre (Eds.), LNAI: Vol. 3410. Third international conference on evolutionary multi-criterion optimization (EMO 2005) (pp. 534–546). New York: Springer. · Zbl 1109.68605
[36] Igel, C., & Toussaint, M. (2003). On classes of functions for which No Free Lunch results hold. Information Processing Letters, 86(6), 317–321. · Zbl 1162.68816 · doi:10.1016/S0020-0190(03)00222-9
[37] Igel, C., Erlhagen, W., & Jancke, D. (2001). Optimization of dynamic neural fields. Neurocomputing, 36(1–4), 225–233. · Zbl 1003.68641 · doi:10.1016/S0925-2312(00)00328-3
[38] Igel, C., Suttorp, T., & Hansen, N. (2006). A computational efficient covariance matrix update and a (1+1)-CMA for evolution strategies. In Proceedings of the genetic and evolutionary computation conference (GECCO 2006) (pp. 453–460). New York: ACM Press.
[39] Igel, C., Hansen, N., & Roth, S. (2007a). Covariance matrix adaptation for multi-objective optimization. Evolutionary Computation, 15(1), 1–28. · doi:10.1162/evco.2007.15.1.1
[40] Igel, C., Suttorp, T., & Hansen, N. (2007b). Steady-state selection and efficient covariance matrix update in the multi-objective CMA-ES. In S. Obayashi, K. Deb, C. Poloni, T. Hiroyasu, & T. Murata (Eds.), LNCS: Vol. 4403. Fourth international conference on evolutionary multi-criterion optimization (EMO 2007) (pp. 171–185). New York: Springer.
[41] Igel, C., Glasmachers, T., & Heidrich-Meisner, V. (2008). Shark. Journal of Machine Learning Research, 9, 993–996. · Zbl 1225.68188
[42] Iorio, A., & Li, X. (2005). Solving rotated multi-objective optimization problems using differential evolution. In G. I. Webb & X. Yu (Eds.), LNCS: Vol. 3339. Proceeding of the 17th joint Australian conference on artificial intelligence (pp. 861–872). New York: Springer.
[43] Jägersküpper, J. (2006). How the (1+1) ES using isotropic mutations minimizes positive definite quadratic forms. Theoretical Computer Science, 36(1), 38–56. · Zbl 1103.68147 · doi:10.1016/j.tcs.2006.04.004
[44] Jägersküpper, J. (2007). Lower bounds for hit-and-run direct search. In Proceedings of the fourth international symposium on stochastic algorithms: foundations and applications (SAGA) (pp. 118–129). New York: Springer. · Zbl 1175.68416
[45] Jägersküpper, J. (2008). Lower bounds for randomized direct search with isotropic sampling. Operations Research Letter, 36(3), 327–332. · Zbl 1152.90668 · doi:10.1016/j.orl.2007.10.003
[46] Jin, Y. (Ed.). (2006). Multi-objective machine learning. New York: Springer. · Zbl 1129.68063
[47] Jones, T., & Forrest, S. (1995). Fitness distance correlation as a measure of problem difficulty for genetic algorithms. In L. Eshelman (Ed.), Proceedings of the 6th international conference on genetic algorithms (pp. 184–192). San Francisco: Morgan Kaufmann.
[48] Kassahun, Y., & Sommer, G. (2005). Efficient reinforcement learning through evolutionary acquisition of neural topologies. In M. Verleysen (Ed.), 13th European symposium on artificial neural networks (ESANN 2005) d-side (pp. 259–266).
[49] Kern, S., Müller, S., Hansen, N., Büche, D., Ocenasek, J., & Koumoutsakos, P. (2004). Learning probability distributions in continuous evolutionary algorithms–A comparative review. Natural Computing, 3, 77–112. · Zbl 1074.68063 · doi:10.1023/B:NACO.0000023416.59689.4e
[50] Klee, V. (1977). Can the measure of a i ,b i ] be computed in less than O(nogn) steps? American Mathematical Monthly, 84, 284–285. · Zbl 1180.68163 · doi:10.2307/2318871
[51] Knight, J. N., & Lunacek, M. (2007). Reducing the space-time complexity of the CMA-ES. In Proceedings of the genetic and evolutionary computation conference (GECCO 2007) (pp. 658–665). New York: ACM Press.
[52] Mandischer, M. (2002). A comparison of evolution strategies and backpropagation for neural network training. Neurocomputing, 42(1–4), 87–117. · Zbl 1002.68704 · doi:10.1016/S0925-2312(01)00596-3
[53] Mersch, B., Glasmachers, T., Meinicke, P., & Igel, C. (2007). Evolutionary optimization of sequence kernels for detection of bacterial gene starts. International Journal of Neural Systems, 17(5), 369–381, special issue on selected papers presented at the international conference on artificial neural networks (ICANN 2006). · doi:10.1142/S0129065707001214
[54] Mierswa, I. (2006). Evolutionary learning with kernels: a generic solution for large margin problems. In Proceedings of the 8th annual conference on genetic and evolutionary computation (GECCO 2006) (pp. 1553–1560).
[55] Mierswa, I. (2007). Controlling overfitting with multi-objective support vector machines. In Proceedings of the 9th annual conference on genetic and evolutionary computation (GECCO 2007) (pp. 1830–1837).
[56] Miettinen, K. (1999). Kluwer’s international series in operations research & management science: Vol. 12: Nonlinear multiobjective optimization. Dordrecht: Kluwer Academic.
[57] Ostermeier, A. (1992). An evolution strategy with momentum adaptation of the random number distribution. Parallel Problem Solving from Nature, 2, 197–206.
[58] Overmars, M. H., & Yap, C. K. (1991). New upper bounds in Klee’s measure problem. SIAM Journal on Computing, 20(6), 1034–1045. · Zbl 0737.68045 · doi:10.1137/0220065
[59] Pellecchia, A., Igel, C., Edelbrunner, J., & Schöner, G. (2005). Making driver modeling attractive. IEEE Intelligent Systems, 20(2), 8–12. · doi:10.1109/MIS.2005.31
[60] Poland, J., & Zell, A. (2001). Main vector adaptation: A CMA variant with linear time and space complexity. In L. Spector, E. D. Goodman, A. Wu, W. B. Langdon, H. M. Voigt, M. Gen, S. Sen, M. Dorigo, S. Pezeshk, M. H. Garzon, & E. Burke (Eds.), Proceedings of the genetic and evolutionary computation conference (GECCO 2001) (pp. 1050–1055). San Francisco: Morgan Kaufmann.
[61] Rechenberg, I. (1973). Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Stuttgart: Frommann-Holzboog.
[62] Ros, R., & Hansen, N. (2008). A simple modification in CMA-ES achieving linear time and space complexity. In G. Rudolph (Ed.), LNCS: Vol. 5199. Parallel problem solving from nature (PPSN X) (pp. 296–305). New York: Springer.
[63] Rudolph, G. (1992). On correlated mutations in evolution strategies. In R. Männer & B. Manderick (Eds.), Parallel problem solving from nature 2 (PPSN II) (pp. 105–114). Amsterdam: Elsevier.
[64] Runarsson, T. P., & Sigurdsson, S. (2004). Asynchronous parallel evolutionary model selection for support vector machines. Neural Information Processing–Letters and Reviews, 3(3), 59–68.
[65] Schneider, S., Igel, C., Klaes, C., Dinse, H., & Wiemer, J. (2004). Evolutionary adaptation of nonlinear dynamical systems in computational neuroscience. Journal of Genetic Programming and Evolvable Machines, 5(2), 215–227. · doi:10.1023/B:GENP.0000023689.70987.6a
[66] Schumer, M., & Steiglitz, K. (1968). Adaptive step size random search. IEEE Transactions on Automatic Control, 13(3), 270–276. · doi:10.1109/TAC.1968.1098903
[67] Schwefel, H. P. (1995). Sixth-generation computer technology series: Evolution and optimum seeking. New York: Wiley.
[68] Siebel, N. T., & Sommer, G. (2007). Evolutionary reinforcement learning of artificial neural networks. International Journal of Hybrid Intelligent Systems, 4(3), 171–183. · Zbl 1146.68436
[69] Sutton, R., & Barto, A. (1998). Reinforcement learning: An Introduction. Cambridge: MIT Press.
[70] Suttorp, T., & Igel, C. (2006). Multi-objective optimization of support vector machines. In Y. Jin (Ed.), Studies in computational intelligence: Vol. 16. Multi-objective machine learning (pp. 199–220). New York: Springer.
[71] Voß, T., Beume, N., Rudolph, G., & Igel, C. (2008). Scalarization versus indicator-based selection in multi-objective CMA evolution strategies. In IEEE congress on evolutionary computation 2008 (CEC 2008) (pp. 3041–3048). New York: IEEE Press.
[72] Whitley, D., Lunacek, M., & Sokolov, A. (2006). Comparing the niches of CMA-ES, CHC and pattern search using diverse benchmarks. LNCS: Vol. 4193. In Parallel problem solving from nature (PPSN IX) (pp. 988–997). New York: Springer.
[73] Winter, S., Brendel, B., Pechlivanis, I., Schmieder, K., & Igel, C. (2008). Registration of CT and intraoperative 3D ultrasound images of the spine using evolutionary and gradient-based methods. IEEE Transactions on Evolutionary Computation, 12(3), 284–296. · doi:10.1109/TEVC.2007.907558
[74] Zitzler, E., & Thiele, L. (1998). Multiobjective optimization using evolutionary algorithms–a comparative case study. In A. E. Eiben, T. Bäck, M. Schoenauer, & H. P. Schwefel (Eds.), Parallel problem solving from nature (PPSN V) (pp. 292–301). New York: Springer.
[75] Zitzler, E., Thiele, L., Laumanns, M., Fonseca, C. M., & Grunert da Fonseca, V. (2003). Performance assessment of multiobjective optimizers: An analysis and review. IEEE Transactions on Evolutionary Computation, 7(2), 117–132. · doi:10.1109/TEVC.2003.810758
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.