×

Provable randomized rounding for minimum-similarity diversification. (English) Zbl 1494.68246

Summary: When searching for information in a data collection, we are often interested not only in finding relevant items, but also in assembling a diverse set, so as to explore different concepts that are present in the data. This problem has been researched extensively. However, finding a set of items with minimal pairwise similarities can be computationally challenging, and most existing works striving for quality guarantees assume that item relatedness is measured by a distance function. Given the widespread use of similarity functions in many domains, we believe this to be an important gap in the literature. In this paper we study the problem of finding a diverse set of items, when item relatedness is measured by a similarity function. We formulate the diversification task using a flexible, broadly applicable minimization objective, consisting of the sum of pairwise similarities of the selected items and a relevance penalty term. To find good solutions we adopt a randomized rounding strategy, which is challenging to analyze because of the cardinality constraint present in our formulation. Even though this obstacle can be overcome using dependent rounding, we show that it is possible to obtain provably good solutions using an independent approach, which is faster, simpler to implement and completely parallelizable. Our analysis relies on a novel bound for the ratio of Poisson-Binomial densities, which is of independent interest and has potential implications for other combinatorial-optimization problems. We leverage this result to design an efficient randomized algorithm that provides a lower-order additive approximation guarantee. We validate our method using several benchmark datasets, and show that it consistently outperforms the greedy approaches that are commonly used in the literature.

MSC:

68T09 Computational aspects of data analysis and big data
60C05 Combinatorial probability
90C20 Quadratic programming

Software:

Cython; LETOR; QPLIB; UCI-ml

References:

[1] Abbassi Z, Mirrokni VS, Thakur M (2013) Diversity maximization under matroid constraints, pp 32-40
[2] Ashkan A, Kveton B, Berkovsky S, Wen Z (2015) Optimal greedy diversity for recommendation
[3] Bansal N, Jain K, Kazeykina A, Naor JS (2010) Approximation algorithms for diversified search ranking. In: International colloquium on automata, languages, and programming. Springer, pp 273-284 · Zbl 1288.68016
[4] Behnel, S.; Bradshaw, R.; Citro, C.; Dalcin, L.; Seljebotn, DS; Smith, K., Cython: the best of both worlds, Comput Sci Eng, 13, 2, 31-39 (2011) · doi:10.1109/MCSE.2010.118
[5] Bhaskara A, Ghadiri M, Mirrokni V, Svensson O (2016) Linear relaxations for finding diverse elements in metric spaces. In: Advances in neural information processing systems, pp 4098-4106
[6] Billionnet, A.; Elloumi, S.; Plateau, MC, Quadratic 0-1 programming: tightening linear or quadratic convex reformulation by use of relaxations, RAIRO-Oper Res, 42, 2, 103-121 (2008) · Zbl 1211.90133 · doi:10.1051/ro:2008011
[7] Borodin, A.; Jain, A.; Lee, HC; Ye, Y., Max-sum diversification, monotone submodular functions, and dynamic updates, ACM Trans Algorithms, 13, 3, 1-25 (2017) · Zbl 1452.68273 · doi:10.1145/3086464
[8] Boyd, S.; Boyd, SP; Vandenberghe, L., Convex optimization (2004), Cambridge: Cambridge University Press, Cambridge · Zbl 1058.90049 · doi:10.1017/CBO9780511804441
[9] Calinescu G, Chekuri C, Pál M, Vondrák J (2007) Maximizing a submodular set function subject to a matroid constraint, pp 182-196 · Zbl 1136.90449
[10] Capannini G, Nardini FM, Perego R, Silvestri F (2011) Efficient diversification of web search results
[11] Carbonell J, Goldstein J (1998) The use of mmr, diversity-based reranking for reordering documents and producing summaries, pp 335-336
[12] Carter, MW, The indefinite zero-one quadratic problem, Discrete Appl Math, 7, 1, 23-44 (1984) · Zbl 0524.90061 · doi:10.1016/0166-218X(84)90111-2
[13] Chandra, B.; Halldórsson, MM, Approximation algorithms for dispersion problems, J Algorithms, 38, 2, 438-465 (2001) · Zbl 0974.68153 · doi:10.1006/jagm.2000.1145
[14] Charikar M, Li S (2012) A dependent lp-rounding approach for the k-median problem. In: International colloquium on automata, languages, and programming. Springer, pp 194-205 · Zbl 1272.90020
[15] Chekuri C, Vondrak J, Zenklusen R (2010) Dependent randomized rounding via exchange properties of combinatorial structures, vol 10. IEEE Computer Society, USA, pp 575-584. doi:10.1109/FOCS.2010.60
[16] Clarke CL, Kolla M, Cormack GV, Vechtomova O, Ashkan A, Büttcher S, MacKinnon I (2008) Novelty and diversity in information retrieval evaluation, pp 659-666
[17] Darroch, JN, On the distribution of the number of successes in independent trials, Ann Math Stat, 35, 3, 1317-1321 (1964) · Zbl 0213.44402 · doi:10.1214/aoms/1177703287
[18] Doerr B (2005) Roundings respecting hard constraints, pp 617-628 · Zbl 1118.90311
[19] Doerr, B.; Wahlström, M., Randomized rounding in the presence of a cardinality constraint, J Exp Algorithmics, 19, 1-1 (2015) · Zbl 1347.68361 · doi:10.1145/2594409
[20] Doerr B, Wahlström M (2016) How to generate randomized roundings with dependencies and how to derandomize them. In: Algorithm engineering. Springer, pp 159-184
[21] Dou Z, Hu S, Chen K, Song R, Wen JR (2011) Multi-dimensional search result diversification, pp 475-484
[22] Dua D, Graff C (2017) Uci machine learning repository
[23] Fernández, M.; Williams, S., Closed-form expression for the poisson-binomial probability density function, IEEE Trans Aerosp Electron Syst, 46, 2, 803-817 (2010) · doi:10.1109/TAES.2010.5461658
[24] Furini, F.; Traversi, E.; Belotti, P.; Frangioni, A.; Gleixner, A.; Gould, N.; Liberti, L.; Lodi, A.; Misener, R.; Mittelmann, H., Qplib: a library of quadratic programming instances, Math Program Comput, 11, 2, 237-265 (2019) · Zbl 1435.90099 · doi:10.1007/s12532-018-0147-4
[25] Gandhi R, Khuller S, Parthasarathy S, Srinivasan A (2002) Dependent rounding in bipartite graphs. In: The 43rd annual IEEE symposium on foundations of computer science, 2002 Proceedings. IEEE, pp 323-332
[26] Gollapudi S, Sharma A (2009) An axiomatic approach for result diversification, pp 381-390
[27] He J, Tong H, Mei Q, Szymanski B (2012) Gender: a generic diversified ranking algorithm. In: Advances in neural information processing systems, pp 1142-1150
[28] Hildebrand R, Weismantel R, Zemmer K (2016) An fptas for minimizing indefinite quadratic forms over integers in polyhedra, pp 1715-1723 · Zbl 1411.68191
[29] Hillion, E.; Johnson, O., A proof of the Shepp-Olkin entropy concavity conjecture, Bernoulli, 23, 4, 3638-3649 (2017) · Zbl 1390.60073 · doi:10.3150/16-BEJ860
[30] Hoeffding, W., On the distribution of the number of successes in independent trials, Ann Math Stat, 27, 3, 713-721 (1956) · Zbl 0073.13902 · doi:10.1214/aoms/1177728178
[31] Jain K, Vazirani VV (1999) Primal-dual approximation algorithms for metric facility location and k-median problems. In: 40th Annual symposium on foundations of computer science (Cat. No. 99CB37039). IEEE, pp 2-13
[32] Kalantari, B.; Bagchi, A., An algorithm for quadratic zero-one programs, Naval Res Logist, 37, 4, 527-538 (1990) · Zbl 0709.90079 · doi:10.1002/1520-6750(199008)37:4<527::AID-NAV3220370407>3.0.CO;2-P
[33] Küçüktunç O, Saule E, Kaya K, Çatalyürek ÜV (2013) Diversified recommendation on graphs: pitfalls, measures, and algorithms, pp 715-726
[34] Lima, RM; Grossmann, IE, On the solution of nonconvex cardinality boolean quadratic programming problems: a computational study, Comput Optim Appl, 66, 1, 1-37 (2017) · Zbl 1371.90088 · doi:10.1007/s10589-016-9856-7
[35] Lucchese C, Nardini FM, Perego R, Orlando S, Trani S (2018) Selective gradient boosting for effective learning to rank. In: The 41st international ACM SIGIR conference on research &amp;amp; development in information retrieval. Association for Computing Machinery, New York, NY, USA, SIGIR 2018, pp 155-164. doi:10.1145/3209978.3210048
[36] Nemhauser, GL; Wolsey, LA; Fisher, ML, An analysis of approximations for maximizing submodular set functions-I, Math Program, 14, 1, 265-294 (1978) · Zbl 0374.90045 · doi:10.1007/BF01588971
[37] Potra, FA; Wright, SJ, Interior-point methods, J Comput Appl Math, 124, 1, 281-302 (2000) · Zbl 0967.65078 · doi:10.1016/S0377-0427(00)00433-7
[38] Qin T, Liu TY (2013) Introducing letor 4.0 datasets. arXiv:1306.2597
[39] Qin, T.; Liu, TY; Xu, J.; Li, H., Letor: a benchmark collection for research on learning to rank for information retrieval, Inf Retr, 13, 4, 346-374 (2010) · doi:10.1007/s10791-009-9123-y
[40] Radlinski F, Dumais S (2006) Improving personalized web search using result diversification, pp 691-692
[41] Rafiei D, Bharat K, Shukla A (2010) Diversifying web search results, pp 781-790
[42] Raghavan, P.; Tompson, CD, Randomized rounding: a technique for provably good algorithms and algorithmic proofs, Combinatorica, 7, 4, 365-374 (1987) · Zbl 0651.90052 · doi:10.1007/BF02579324
[43] Santos RL, Macdonald C, Ounis I (2010) Selectively diversifying web search results, pp 1179-1188
[44] Srinivasan A (2001) Distributions on level-sets with applications to approximation algorithms, pp 588-597
[45] Tong H, He J, Wen Z, Konuru R, Lin CY (2011) Diversified ranking on large graphs: an optimization viewpoint, pp 1028-1036
[46] Tsaparas P, Ntoulas A, Terzi E (2011) Selecting a comprehensive set of reviews, pp 168-176
[47] Vavasis, SA, Approximation algorithms for indefinite quadratic programming, Math Program, 57, 1-3, 279-311 (1992) · Zbl 0845.90095 · doi:10.1007/BF01581085
[48] Wang YH (1993) On the number of successes in independent trials. Statistica Sinica 295-312 · Zbl 0822.60006
[49] Watrigant R, Bougeret M, Giroudeau R (2012) The k-sparsest subgraph problem. HAL Arch Ouvertes · Zbl 1416.68141
[50] Williamson, DP; Shmoys, DB, The design of approximation algorithms (2011), Cambridge: Cambridge University Press, Cambridge · Zbl 1219.90004 · doi:10.1017/CBO9780511921735
[51] Zadeh SA, Ghadiri M (2015) Max-sum diversification, monotone submodular functions and semi-metric spaces. arXiv preprint arXiv:151102402
[52] Zhang M, Hurley N (2008) Avoiding monotony: improving the diversity of recommendation lists, pp 123-130
[53] Zhang, S., Quadratic maximization and semidefinite relaxation, Math Program, 87, 3, 453-465 (2000) · Zbl 1009.90080 · doi:10.1007/s101070050006
[54] Ziegler CN, McNee SM, Konstan JA, Lausen G (2005) Improving recommendation lists through topic diversification, pp 22-32
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.