×

Optimal analysis for bandit learning in matching markets with serial dictatorship. (English) Zbl 07898964

Summary: The problem of two-sided matching markets is well-studied in computer science and economics, owing to its diverse applications across numerous domains. Since market participants are usually uncertain about their preferences in various online matching platforms, an emerging line of research is dedicated to the online setting where one-side participants (players) learn their unknown preferences through multiple rounds of interactions with the other side (arms). Sankararaman et al. [23] provide an \(\Omega \left( \frac{N \log ( T )}{\Delta^2} + \frac{K \log ( T )}{\Delta} \right)\) regret lower bound for this problem under serial dictatorship assumption, where \(N\) is the number of players, \(K(\geq N)\) is the number of arms, \( \Delta\) is the minimum reward gap across players and arms, and \(T\) is the time horizon. Serial dictatorship assumes arms have the same preferences, which is common in reality when one side participants have a unified evaluation standard. Recently, the work of Kong and Li [10] proposes the ET-GS algorithm and achieves an \(O \left( \frac{K \log ( T )}{\Delta^2} \right)\) regret upper bound, which is the best upper bound attained so far. Nonetheless, a gap between the lower and upper bounds, ranging from \(N\) to \(K\), persists. It remains unclear whether the lower bound or the upper bound needs to be improved. In this paper, we propose a multi-level successive selection algorithm that obtains an \(O \left( \frac{N \log ( T )}{\Delta^2} + \frac{K \log ( T )}{\Delta} \right)\) regret bound when the market satisfies serial dictatorship. To the best of our knowledge, we are the first to propose an algorithm that matches the lower bound in the problem of matching markets with bandits.

MSC:

68Qxx Theory of computing
Full Text: DOI

References:

[1] Basu, Soumya; Sankararaman, Karthik Abinav; Sankararaman, Abishek, Beyond \(\log^2\) (t) regret for decentralized bandits in matching markets, (International Conference on Machine Learning, 2021, PMLR), 705-715
[2] Cen, Sarah H.; Shah, Devavrat, Regret, stability & fairness in matching markets with bandit learners, (International Conference on Artificial Intelligence and Statistics, 2022, PMLR), 8938-8968
[3] Clark, Simon, The uniqueness of stable matchings, Contrib. Theor. Econ., 6, 1, 2006
[4] Dai, Xiaowu; Jordan, Michael, Learning in multi-stage decentralized matching markets, Adv. Neural Inf. Process. Syst., 34, 12798-12809, 2021
[5] Dai, Xiaowu; Jordan, Michael I., Learning strategies in decentralized matching markets under uncertain preferences, J. Mach. Learn. Res., 22, 1, 11806-11855, 2021 · Zbl 07626775
[6] Das, Sanmay; Kamenica, Emir, Two-sided bandits and the dating market, (Proceedings of the 19th International Joint Conference on Artificial Intelligence, 2005), 947-952
[7] Gale, David; Shapley, Lloyd S., College admissions and the stability of marriage, Am. Math. Mon., 69, 1, 9-15, 1962 · Zbl 0109.24403
[8] Ghosh, Avishek; Sankararaman, Abishek; Ramchandran, Kannan; Javidi, Tara; Mazumdar, Arya, Decentralized competing bandits in non-stationary matching markets, 2022, arXiv preprint
[9] Jagadeesan, Meena; Wei, Alexander; Wang, Yixin; Jordan, Michael; Steinhardt, Jacob, Learning equilibria in matching markets from bandit feedback, Adv. Neural Inf. Process. Syst., 34, 3323-3335, 2021
[10] Kong, Fang; Li, Shuai, Player-optimal stable regret for bandit learning in matching markets, (Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2023, SIAM), 1512-1522 · Zbl 07848015
[11] Kong, Fang; Li, Shuai, Improved bandits in many-to-one matching markets with incentive compatibility, (Proceedings of the AAAI Conference on Artificial Intelligence, 2024), 38:13256-13264, 03
[12] Kong, Fang; Yin, Junming; Li, Shuai, Thompson sampling for bandit learning in matching markets, (De Raedt, Lud, Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22. Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, International Joint Conferences on Artificial Intelligence Organization, vol. 7, 2022), 3164-3170, Main Track
[13] Lattimore, Tor; Szepesvári, Csaba, Bandit Algorithms, 2020, Cambridge University Press · Zbl 1439.68002
[14] Li, Yuantong; Wang, Chi-hua; Cheng, Guang; Sun, Will Wei, Rate-optimal contextual online matching bandit, 2022, arXiv preprint
[15] Liu, Lydia T.; Mania, Horia; Jordan, Michael, Competing bandits in matching markets, (International Conference on Artificial Intelligence and Statistics, 2020, PMLR), 1618-1628
[16] Liu, Lydia T.; Ruan, Feng; Mania, Horia; Jordan, Michael I., Bandit learning in decentralized matching markets, J. Mach. Learn. Res., 22, 1-34, 2021 · Zbl 07626726
[17] Maheshwari, Chinmay; Sastry, Shankar; Mazumdar, Eric, Decentralized, communication- and coordination-free learning in structured matching markets, (Advances in Neural Information Processing Systems, 2022)
[18] Min, Yifei; Wang, Tianhao; Xu, Ruitu; Wang, Zhaoran; Jordan, Michael I.; Yang, Zhuoran, Learn to match with no regret: reinforcement learning in Markov matching markets, 2022, arXiv preprint
[19] Muthirayan, Deepan; Maheshwari, Chinmay; Khargonekar, Pramod P.; Sastry, Shankar, Competing bandits in time varying matching markets, 2022, arXiv preprint
[20] Roth, Alvin E., The evolution of the labor market for medical interns and residents: a case study in game theory, J. Polit. Econ., 92, 6, 991-1016, 1984
[21] Roth, Alvin E., The economist as engineer: game theory, experimentation, and computation as tools for design economics, Econometrica, 70, 4, 1341-1378, 2002 · Zbl 1137.91339
[22] Roth, Alvin E.; Sotomayor, Marilda, Two-sided matching, (Handbook of Game Theory with Economic Applications, vol. 1, 1992), 485-541 · Zbl 0968.91516
[23] Sankararaman, Abishek; Basu, Soumya; Sankararaman, Karthik Abinav, Dominate or delete: decentralized competing bandits in serial dictatorship, (International Conference on Artificial Intelligence and Statistics, 2021, PMLR), 1252-1260
[24] Su, Yi; Bayoumi, Magd; Joachims, Thorsten, Optimizing rankings for recommendation in matching markets, (Proceedings of the ACM Web Conference 2022, 2022), 328-338
[25] Wang, Zilong; Guo, Liya; Yin, Junming; Li, Shuai, Bandit learning in many-to-one matching markets, (Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022), 2088-2097
[26] Zhang, Yirui; Wang, Siwei; Fang, Zhixuan, Matching in multi-arm bandit with collision, (Advances in Neural Information Processing Systems, 2022)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.