×

Bandit learning in decentralized matching markets. (English) Zbl 07626726

Summary: We study two-sided matching markets in which one side of the market (the players) does not have a priori knowledge about its preferences for the other side (the arms) and is required to learn its preferences from experience. Also, we assume the players have no direct means of communication. This model extends the standard stochastic multi-armed bandit framework to a decentralized multiple player setting with competition. We introduce a new algorithm for this setting that, over a time horizon \(T\), attains \(\mathcal{O}(\log(T))\) stable regret when preferences of the arms over players are shared, and \(\mathcal{O}(\log(T)^2)\) regret when there are no assumptions on the preferences on either side. Moreover, in the setting where a single player may deviate, we show that the algorithm is incentive-compatible whenever the arms’ preferences are shared, but not necessarily so when preferences are fully general.

MSC:

68T05 Learning and adaptive systems in artificial intelligence

References:

[1] Atila Abdulkadiro˘glu and Tayfun S¨onmez. School choice: A mechanism design approach. American economic review, 93(3):729-747, 2003.
[2] Atila Abdulkadiro˘glu, Parag Pathak, Alvin E Roth, and Tayfun Sonmez. Changing the boston school choice mechanism. Technical report, National Bureau of Economic Research, 2006.
[3] Hernan Abeledo and Uriel G. Rothblum. Paths to marriage stability.Discrete Applied Mathematics, 63:1-12, 10 1995. · Zbl 0836.05060
[4] Heiner Ackermann, Paul W. Goldberg, Vahab S. Mirrokni, Heiko R¨oglin, and Berthold V¨ocking. Uncoordinated two-sided matching markets. InProceedings of the 9th ACM Conference on Electronic Commerce, pages 256-263, 2008.
[5] Mohammad Akbarpour, Shengwu Li, and Shayan Oveis Gharan. Thickness and information in dynamic matching markets.Journal of Political Economy, 128(3):783-815, 2020.
[6] A. Anandkumar, N. Michael, A. K. Tang, and A. Swami. Distributed algorithms for learning and cognitive medium access with logarithmic regret.IEEE Journal on Selected Areas in Communications, 29(4):731-745, 2011.
[7] Guy Aridor, Kevin Liu, Aleksandrs Slivkins, and Zhiwei Steven Wu. Competing bandits: The perils of exploration under competition.The 20th ACM Conference on Economics and Computation, 2019.
[8] Nick Arnosti, Ramesh Johari, and Yash Kanoria. Managing congestion in decentralized matching markets. InProceedings of the fifteenth ACM conference on Economics and computation, pages 451-451, 2014.
[9] Itai Ashlagi, Mark Braverman, Yash Kanoria, and Peng Shi. Communication requirements and informative signaling in matching markets. InProceedings of the 2017 ACM Conference on Economics and Computation, EC ’17, pages 263-263, 2017a.
[10] Itai Ashlagi, Yash Kanoria, and Jacob D Leshno. Unbalanced random matching markets: The stark effect of competition.Journal of Political Economy, 125(1):69-98, 2017b.
[11] Itai Ashlagi, Anilesh K. Krishnaswamy, Rahul M. Makhijani, Daniela Sab´an, and Kirankumar Shiragur. Assortment planning for two-sided sequential matching markets.CoRR, abs/1907.04485, 2019. · Zbl 1502.90084
[12] O. Avner and S. Mannor. Multi-user lax communications: A multi-armed bandit approach. InThe 35th Annual IEEE International Conference on Computer Communications, pages 1-9, 2016.
[13] Orly Avner and Shie Mannor. Concurrent bandits and cognitive radio networks. In Toon Calders, Floriana Esposito, Eyke H¨ullermeier, and Rosa Meo, editors,Machine Learning and Knowledge Discovery in Databases, pages 66-81, 2014.
[14] Ilai Bistritz and Amir Leshem. Distributed multi-player bandits—A game of thrones approach. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors,Advances in Neural Information Processing Systems 31, pages 7222- 7232, 2018. · Zbl 1466.91018
[15] Ilai Bistritz, Tavor Z. Baharav, A. Leshem, and N. Bambos. My fair bandit: Distributed learning of max-min fairness with multi-player bandits. InProceedings of The 37th International Conference on Machine Learning, 2020.
[16] Etienne Boursier and Vianney Perchet. Selfish robustness and equilibria in multi-player bandits. In Jacob Abernethy and Shivani Agarwal, editors,Proceedings of the 33rd Conference on Learning Theory, volume 125 ofProceedings of Machine Learning Research, pages 530-581, 2020.
[17] S´ebastien Bubeck and Nicol‘o Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems.Foundations and Trends in Machine Learning, 5(1):1-122, 2012. · Zbl 1281.91051
[18] S´ebastien Bubeck, Thomas Budzinski, and Mark Sellke. Cooperative and stochastic multiplayer multi-armed bandit: Optimal regret with neither communication nor collisions. arXiv preprint arXiv:2011.03896, 2020a.
[19] S´ebastien Bubeck, Yuanzhi Li, Yuval Peres, and Mark Sellke. Non-stochastic multi-player multi-armed bandits: Optimal rate with collision information, sublinear without. In Proceedings of the 33rd Conference on Learning Theory, pages 961-987, 2020b.
[20] Sarah H Cen and Devavrat Shah. Regret, stability, and fairness in matching markets with bandit learners.arXiv preprint arXiv:2102.06246, 2021.
[21] Nicol‘o Cesa-Bianchi, Claudio Gentile, Yishay Mansour, and Alberto Minora. Delay and cooperation in nonstochastic bandits. In Vitaly Feldman, Alexander Rakhlin, and Ohad Shamir, editors,29th Annual Conference on Learning Theory, volume 49 ofProceedings of Machine Learning Research, pages 605-622, 23-26 Jun 2016.
[22] Xiaowu Dai and Michael I. Jordan. Learning strategies in decentralized matching markets under uncertain preferences.arXiv preprint arXiv:2011.00159, 2020. · Zbl 07626775
[23] S. J. Darak and M. K. Hanawal. Multi-player multi-armed bandits for stable allocation in heterogeneous ad-hoc networks.IEEE Journal on Selected Areas in Communications, 37 (10):2350-2363, 2019.
[24] Sanmay Das and Emir Kamenica. Two-sided bandits and the dating market. InProceedings of the 19th International Joint Conference on Artificial Intelligence, pages 947-952, 2005.
[25] Federico Echenique and Leeat Yariv. An experimental study of decentralized matching. 2012. · Zbl 1396.91567
[26] Drew Fudenberg and David K Levine.The theory of learning in games, volume 2. MIT press, 1998. · Zbl 0939.91004
[27] D. Gale and L. S. Shapley. College admissions and the stability of marriage.The American Mathematical Monthly, 69(1):9-15, 1962. · Zbl 0109.24403
[28] Moshe Hoffman, Daniel Moeller, and Ramamohan Paturi. Jealousy graphs: Structure and complexity of decentralized stable matching. InWeb and Internet Economics, pages 263-276, 2013. · Zbl 1406.91279
[29] Junling Hu, Michael P Wellman, et al.Multiagent reinforcement learning: theoretical framework and an algorithm. InICML, volume 98, pages 242-250. Citeseer, 1998.
[30] Nicole Immorlica, Jacob Leshno, Irene Lo, and Brendan Lucier. Information acquisition in matching markets: The role of price discovery.Available at SSRN, 2020.
[31] Ramesh Johari, Vijay Kamble, and Yash Kanoria.Matching while learning.InACM Conference on Economics and Computation, pages 119-119, 2017. · Zbl 1470.90103
[32] D. Kalathil, N. Nayyar, and R. Jain. Decentralized learning for multiplayer multiarmed bandits.IEEE Transactions on Information Theory, 60(4):2331-2345, 2014. · Zbl 1360.91038
[33] Donald E. Knuth.Stable Marriage and its Relation to Other Combinatorial Problems. American Mathematical Society, 1997.
[34] T.L Lai and Herbert Robbins. Asymptotically efficient adaptive allocation rules.Advances in Applied Mathematics, 6(1):4 - 22, 1985. · Zbl 0568.62074
[35] Robin S Lee and Michael Schwarz. Interviewing in two-sided matching markets. Technical report, National Bureau of Economic Research, 2009.
[36] Michael L Littman. Markov games as a framework for multi-agent reinforcement learning. InMachine learning proceedings 1994, pages 157-163. Elsevier, 1994.
[37] Keqin Liu and Qing Zhao. Distributed learning in multi-armed bandit with multiple players. IEEE Transactions on Signal Processing, 58(11):5667-5681, 2010. · Zbl 1391.62006
[38] Lydia T. Liu, Horia Mania, and Michael Jordan. Competing bandits in matching markets. InInternational Conference on Artificial Intelligence and Statistics, volume 108, pages 1618-1628, 26-28 Aug 2020.
[39] G´abor Lugosi and Abbas Mehrabian. Multiplayer bandits without observing collision information.arXiv preprint arXiv:1808.08416, 2018. · Zbl 1489.91058
[40] Yishay Mansour, Aleksandrs Slivkins, and Zhiwei Steven Wu. Competing bandits: Learning under competition. In9th Innovations in Theoretical Computer Science Conference, ITCS 2018, January 11-14, 2018, Cambridge, MA, USA, pages 48:1-48:27, 2018. · Zbl 1462.68160
[41] Muriel Niederle and Leeat Yariv. Decentralized matching with aligned preferences. Technical report, National Bureau of Economic Research, 2009.
[42] Joana Pais, Agnes Pint´er, and Robert F Veszteg.Decentralized matching markets: a laboratory experiment. 2012.
[43] Jonathan Rosenski, Ohad Shamir, and Liran Szlak. Multi-player bandits—A musical chairs approach. InProceedings of The 33rd International Conference on Machine Learning, volume 48, pages 155-163, 2016.
[44] Alvin E. Roth and Marilda A. Oliveira Sotomayor.Two-Sided Matching: A Study in Game-Theoretic Modeling and Analysis. Econometric Society Monographs. Cambridge University Press, 1990. · Zbl 0726.90003
[45] Alvin E. Roth and John H. Vande Vate. Random paths to stability in two-sided matching. Econometrica, 58(6):1475-1480, 1990. · Zbl 0731.90007
[46] Abishek Sankararaman, Soumya Basu, and Karthik Abinav Sankararaman.Dominate or delete: Decentralized competing bandits with uniform valuation.arXiv preprint arXiv:2006.15166, 2020.
[47] S. Shahrampour, A. Rakhlin, and A. Jadbabaie. Multi-armed bandits in multi-agent networks. InIEEE International Conference on Acoustics, Speech and Signal Processing, pages 2786-2790, 2017.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.