×

Algorithm portfolio selection as a bandit problem with unbounded losses. (English) Zbl 1234.68339

Summary: We propose a method that learns to allocate computation time to a given set of algorithms, of unknown performance, with the aim of solving a given sequence of problem instances in a minimum time. Analogous meta-learning techniques are typically based on models of algorithm performance, learned during a separate offline training sequence, which can be prohibitively expensive. We adopt instead an online approach, named GambleTA, in which algorithm performance models are iteratively updated, and used to guide allocation on a sequence of problem instances. GambleTA is a general method for selecting among two or more alternative algorithm portfolios. Each portfolio has its own way of allocating computation time to the available algorithms, possibly based on performance models, in which case its performance is expected to improve over time, as more runtime data becomes available. The resulting exploration-exploitation trade-off is represented as a bandit problem. In our previous work, the algorithms corresponded to the arms of the bandit, and allocations evaluated by the different portfolios were mixed, using a solver for the bandit problem with expert advice, but this required the setting of an arbitrary bound on algorithm runtimes, invalidating the optimal regret of the solver. In this paper, we propose a simpler version of GambleTA, in which the allocators correspond to the arms, such that a single portfolio is selected for each instance. The selection is represented as a bandit problem with partial information, and an unknown bound on losses. We devise a solver for this game, proving a bound on its expected regret. We present experiments based on results from several solver competitions, in various domains, comparing GambleTA with another online method.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
68T20 Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.)
68W27 Online algorithms; streaming algorithms
68Q25 Analysis of algorithms and problem complexity
62N99 Survival analysis and censored data
62G99 Nonparametric inference

Software:

SPOT; SATzilla
Full Text: DOI

References:

[1] Allenberg, C., Auer, P., Györfi, L., Ottucsák, G.: Hannan consistency in on-line learning in case of unbounded losses under partial monitoring. In: Balcázar, J.L., Long, P.M., Stephan, F. (eds.) Algorithmic Learning Theory–ALT. LNCS, vol. 4264, pp. 229–243. Springer (2006) · Zbl 1168.68449
[2] Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1), 48–77 (2002) · Zbl 1029.68087 · doi:10.1137/S0097539701398375
[3] Babai, L.: Monte-Carlo Algorithms in Graph Isomorphism Testing. Technical Report 79-10, Univ. de Montréal, Dép. de mathématiques et de statistique (1979)
[4] Battiti, R., Brunato, M., Mascia, F.: Reactive search and intelligent optimization. In: Operations Research/Computer Science Interfaces, vol. 45. Springer Verlag, Berlin (2008) · Zbl 1158.68036
[5] Cesa-Bianchi, N., Lugosi, G., Stoltz, G.: Minimizing regret with label efficient prediction. IEEE Trans. Inf. Theory 51(6), 2152–2162 (2005) · Zbl 1295.68183 · doi:10.1109/TIT.2005.847729
[6] Cesa-Bianchi, N., Mansour, Y., Stoltz, G.: Improved second-order bounds for prediction with expert advice. In: Auer, P., Meir, R. (eds.) 18th Annual Conference on Learning Theory–COLT. LNCS, vol. 3559, pp. 217–232. Springer (2005) · Zbl 1137.68525
[7] Cesa-Bianchi, N., Mansour, Y., Stoltz, G.: Improved second-order bounds for prediction with expert advice. Mach. Learn. 66(2–3), 321–352 (2007) · Zbl 1137.68525 · doi:10.1007/s10994-006-5001-7
[8] Finkelstein, L., Markovitch, S., Rivlin, E.: Optimal schedules for parallelizing anytime algorithms: the case of independent processes. In: Eighteenth National Conference on Artificial Intelligence–AAAI, pp. 719–724. AAAI Press (2002)
[9] Finkelstein, L., Markovitch, S., Rivlin, E.: Optimal schedules for parallelizing anytime algorithms: the case of shared resources. J. Artif. Intell. Res. 19, 73–138 (2003) · Zbl 1026.68159
[10] Fitzmaurice, G., Davidian, M., Verbeke, G., Molenberghs, G.: Longitudinal Data Analysis. Chapman & Hall/CRC Press (2008)
[11] Gagliolo, M., Zhumatiy, V., and Schmidhuber, J.: Adaptive online time allocation to search algorithms. In: Boulicaut, J.F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) Machine Learning: ECML 2004. Proceedings of the 15th European Conference on Machine Learning. LNCS, vol. 3201, pp. 134–143. Springer (2004) · Zbl 1132.68546
[12] Gagliolo, M.: Universal search. Scholarpedia 2(11), 2575 (2007)
[13] Gagliolo, M.: Online dynamic algorithm portfolios. PhD thesis, IDSIA/University of Lugano, Lugano, Switzerland (2010) · Zbl 1214.68480
[14] Gagliolo, M., Legrand, C.: Algorithm survival analysis. In: Bartz-Beielstein, T., Chiarandini, M., Paquete, L., Preuss, M. (eds.) Experimental Methods for the Analysis of Optimization Algorithms, pp. 161–184. Springer, Berlin, Heidelberg (2010) · Zbl 1214.68480
[15] Gagliolo, M., Legrand, C., Birattari, M.: Mixed-effects modeling of optimisation algorithm performance. In: Stützle, T., Birattari, M., Hoos, H.H. (eds.) Engineering Stochastic Local Search Algorithms–SLS. LNCS, vol. 5752, pp. 150–154. Springer (2009)
[16] Gagliolo, M., Schmidhuber, J.: A neural network model for inter-problem adaptive online time allocation. In: Duch, W., et al. (eds.) Artificial Neural Networks: Formal Models and Their Applications–ICANN, Proceedings, Part 2. LNCS, vol. 3697, pp. 7–12. Springer, Berlin (2005)
[17] Gagliolo, M., Schmidhuber, J.: Learning dynamic algorithm portfolios. Ann. Math. Artif. Intell. 47(3–4), 295–328 (2006) · Zbl 1113.68101 · doi:10.1007/s10472-006-9036-z
[18] Gagliolo, M., Schmidhuber, J.: Learning restart strategies. In: Veloso, M.M. (ed.) Twentieth International Joint Conference on Artificial Intelligence–IJCAI, vol. 1, pp. 792–797. AAAI Press (2007)
[19] Gagliolo, M., Schmidhuber, J.: Towards distributed algorithm portfolios. In: Corchado, J.M., et al. (eds.) International Symposium on Distributed Computing and Artificial Intelligence–DCAI. Advances in Soft Computing, vol. 50, pp. 634–643. Springer (2008)
[20] Gagliolo, M., Schmidhuber, J.: Algorithm selection as a bandit problem with unbounded losses. In: Blum, C., Battiti, R. (eds.) Learning and Intelligent Optimization. 4th International Conference, LION 4, Venice, Italy, January 18–22 (2010) Selected Papers, Lecture Notes in Computer Science, vol. 6073, pp. 82–96. Springer, Berlin, Heidelberg (2010)
[21] Gomes, C.P., Selman, B.: Algorithm portfolios. Artif. Intell. 126(1–2), 43–62 (2001) · Zbl 0969.68047 · doi:10.1016/S0004-3702(00)00081-3
[22] Gomes, C.P., Selman, B., Crato, N., Kautz, H.: Heavy-tailed phenomena in satisfiability and constraint satisfaction problems. J. Autom. Reason. 24(1–2), 67–100 (2000) · Zbl 0967.68145 · doi:10.1023/A:1006314320276
[23] Hoos, H.H., Stützle, T.: Stochastic Local Search: Foundations & Applications. Morgan Kaufmann (2004) · Zbl 1126.68032
[24] Horvitz, E.J., Zilberstein, S.: Computational tradeoffs under bounded resources (editorial). Artif. Intell. 126(1–2), 1–4 (2001) · doi:10.1016/S0004-3702(01)00051-0
[25] Huberman, B.A., Lukose, R.M., Hogg, T.: An economics approach to hard computational problems. Science 27, 51–53 (1997) · doi:10.1126/science.275.5296.51
[26] Hutter, F., Hamadi, Y.: Parameter Adjustment Based on Performance Prediction: Towards an Instance-aware Problem Solver. Technical Report MSR-TR-2005-125, Microsoft Research, Cambridge, UK, (2005)
[27] Kaplan, E.L., Meyer, P.: Nonparametric estimation from incomplete samples. J. Am. Stat. Assoc. 73, 457–481 (1958) · Zbl 0089.14801 · doi:10.1080/01621459.1958.10501452
[28] Klein, J.P., Moeschberger, M.L.: Survival Analysis: Techniques for Censored and Truncated Data, 2nd edn. Springer, Berlin (2003) · Zbl 1011.62106
[29] Kolen, J.F.: Faster learning through a probabilistic approximation algorithm. In: IEEE International Conference on Neural Networks, vol. 1, pp. 449–454 (1988)
[30] Leyton-Brown, K., Nudelman, E., Shoham, Y.: Learning the empirical hardness of optimization problems: the case of combinatorial auctions. In: Van Hentenryck, P. (ed.) ICCP: International Conference on Constraint Programming (CP). LNCS, vol. 2470, pp. 556–572. Springer (2002)
[31] Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Inf. Comput. 108(2), 212–261 (1994) · Zbl 0804.68121 · doi:10.1006/inco.1994.1009
[32] Luby, M., Sinclair, A., Zuckerman, D.: Optimal speedup of Las Vegas algorithms. Inf. Process. Lett. 47(4), 173–180 (1993) · Zbl 0797.68139 · doi:10.1016/0020-0190(93)90029-9
[33] Mitchell, T.M.: Machine Learning. McGraw-Hill Science/Engineering/Math (1997)
[34] Muselli, M., Rabbia, M.: Parallel trials versus single search in supervised learning. In: Simula, O. (ed.) Second International Conference on Artificial Neural Networks–ICANN, pp. 24–28. Elsevier (1991)
[35] Nudelman, E., Leyton-Brown, K., Hoos, H.H., Devkar, A., Shoham, Y.: Understanding random SAT: beyond the clauses-to-variables ratio. In: Wallace, M. (ed.) Principles and Practice of Constraint Programming–CP. LNCS, vol. 3258, pp. 438–452. Springer (2004) · Zbl 1152.68569
[36] Petrik, M.: Learning Parallel Portfolios of Algorithms. Master’s thesis, Comenius University (2005)
[37] Petrik, M.: Statistically Optimal Combination of Algorithms. Presented at SOFSEM (2005)
[38] Petrik, M., Zilberstein, S.: Learning parallel portfolios of algorithms. Ann. Math. Artif. Intell. 48(1–2), 85–106 (2006) · Zbl 1121.68095 · doi:10.1007/s10472-007-9050-9
[39] Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley-Interscience, New York (1994) · Zbl 0829.90134
[40] Rice, J.R.: The algorithm selection problem. In: Rubinoff, M., Yovits, M.C. (eds.) Advances in Computers, vol. 15, pp. 65–118. Academic Press (1976)
[41] Robbins, H.: Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58, 527–535 (1952) · Zbl 0049.37009 · doi:10.1090/S0002-9904-1952-09620-8
[42] Sayag, T., Fine, S., Mansour, Y.: Combining multiple heuristics. In: STACS, pp. 242–253 (2006) · Zbl 1136.68520
[43] Schaul, T., Schmidhuber, J.: Metalearning. Scholarpedia 5(6), 4650 (2010) · doi:10.4249/scholarpedia.4650
[44] Smith-Miles, K.A.: Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput. Surv. 41(1), 1–25 (2008) · doi:10.1145/1456650.1456656
[45] Streeter, M.: Using Online Algorithms to Solve NP-hard Problems more Efficiently in Practice. PhD thesis, Carnegie Mellon University (2007)
[46] Streeter, M., Smith, S.F.: New techniques for algorithm portfolio design. In: McAllester, D.A., Myllymäki, P. (eds.) 24th Conference on Uncertainty in Artificial Intelligence–UAI, pp. 519–527. AUAI Press (2008)
[47] Streeter, M.J., Golovin, D., Smith, S.F.: Combining multiple heuristics online. In: Holte, R.C., Howe, A. (eds.) Twenty-second AAAI Conference on Artificial Intelligence, pp. 1197–1203. AAAI Press (2007)
[48] Xu, L., Hutter, F., Hoos, H.H., Leyton-Brown, K.: SATzilla-07: the design and analysis of an algorithm portfolio for SAT. In: Bessiere, C. (ed.) Principles and Practice of Constraint Programming–CP. LNCS, vol. 4741, pp. 712–727. Springer (2007)
[49] Xu, L., Hutter, F., Hoos, H.H., Leyton-Brown, K.: SATzilla: portfolio-based algorithm selection for SAT. J. Artif. Intell. Res. 32, 565–606 (2008) · Zbl 1182.68272
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.