Document Zbl 1300.68043

Letham, Benjamin; Rudin, Cynthia; Madigan, David

Sequential event prediction. (English) Zbl 1300.68043

Mach. Learn. 93, No. 2-3, 357-380 (2013).

Summary: In sequential event prediction, we are given a “sequence database” of past event sequences to learn from, and we aim to predict the next event within a current event sequence. We focus on applications where the set of the past events has predictive power and not the specific order of those past events. Such applications arise in recommender systems, equipment maintenance, medical informatics, and in other domains. Our formalization of sequential event prediction draws on ideas from supervised ranking. We show how specific choices within this approach lead to different sequential event prediction problems and algorithms. In recommender system applications, the observed sequence of events depends on user choices, which may be influenced by the recommendations, which are themselves tailored to the user’s choices. This leads to sequential event prediction algorithms involving a non-convex optimization problem. We apply our approach to an online grocery store recommender system, email recipient recommendation, and a novel application in the health event prediction domain.

Cited in 2 Documents

MSC:

68T05	Learning and adaptive systems in artificial intelligence
68T35	Theory of languages and software systems (knowledge-based systems, expert systems, etc.) for artificial intelligence

Keywords:

sequential event prediction; supervised ranking; recommender systems

Software:

LBFGS-B; L-BFGS-B; SVMstruct; FP-growth; L-BFGS

Cite Review PDF

Full Text: DOI Link

References:

[1]	Adomavicius, G., & Tuzhilin, A. (2005). Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6), 734-749. · doi:10.1109/TKDE.2005.99
[2]	Agichtein, E.; Brill, E.; Dumais, S., Improving web search ranking by incorporating user behavior information, 19-26 (2006) · doi:10.1145/1148170.1148177
[3]	Agichtein, E.; Brill, E.; Dumais, S.; Ragno, R., Learning user interaction models for predicting web search result preferences, 3-10 (2006) · doi:10.1145/1148170.1148175
[4]	Berchtold, A., & Raftery, A. E. (2002). The mixture transition distribution model for high-order Markov chains and non-Gaussian time series. Statistical Science, 17(3), 328-356. · Zbl 1013.62088 · doi:10.1214/ss/1042727943
[5]	Bertsekas, D. P. (1995). Nonlinear programming. Belmont: Athena Scientific. · Zbl 1042.68098
[6]	Bigal, M. E., Liberman, J. N., & Lipton, R. B. (2006). Obesity and migraine. Neurology, 66(4), 545-550. · doi:10.1212/01.wnl.0000197218.05284.82
[7]	Borgelt, C., An implementation of the FP-growth algorithm, 1-5 (2005) · doi:10.1145/1133905.1133907
[8]	Bottou, L., Large-scale machine learning with stochastic gradient descent, 177-187 (2010) · Zbl 1436.68293
[9]	Burges, C.; Shaked, T.; Renshaw, E.; Lazier, A.; Deeds, M.; Hamilton, N.; Hullender, G., Learning to rank using gradient descent, 89-96 (2005) · doi:10.1145/1102351.1102363
[10]	Byrd, R. H., Lu, P., Nocedal, J., & Zhu, C. (1995). A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing, 16(5), 1190-1208. · Zbl 0836.65080 · doi:10.1137/0916069
[11]	Cao, Z.; Qin, T.; Liu, T. Y.; Tsai, M. F.; Li, H., Learning to rank: from pairwise approach to listwise approach, 129-136 (2007) · doi:10.1145/1273496.1273513
[12]	Carvalho, V. R.; Cohen, W. W., Ranking users for intelligent message addressing, 321-333 (2008)
[13]	Chang, A., Rudin, C., Cavaretta, M., Thomas, R., & Chou, G. (2012). How to reverse-engineer quality rankings. Machine Learning, 88(3), 369-398. · doi:10.1007/s10994-012-5295-6
[14]	Chapelle, O., & Keerthi, S. S. (2010). Efficient algorithms for ranking with SVMs. Information Retrieval, 13(3), 201-215. · doi:10.1007/s10791-009-9109-9
[15]	Clémençon, S.; Vayatis, N., Empirical performance maximization for linear rank statistics, 305-312 (2008)
[16]	Clémençon, S., Lugosi, G., & Vayatis, N. (2008). Ranking and empirical minimization of U-statistics. Annals of Statistics, 36(2), 844-874. · Zbl 1181.68160 · doi:10.1214/009052607000000910
[17]	Cohen, W. W., Schapire, R. E., & Singer, Y. (1999). Learning to order things. The Journal of Artificial Intelligence Research, 10, 243-270. · Zbl 0915.68031
[18]	Cossock, D., & Zhang, T. (2008). Statistical analysis of Bayes optimal subset ranking. IEEE Transactions on Information Theory, 54(11), 5140-5154. · Zbl 1319.62042 · doi:10.1109/TIT.2008.929939
[19]	Davis, D. A., Chawla, N. V., Christakis, N. A., & Barabasi, A. L. (2010). Time to CARE: a collaborative engine for practical disease prediction. Data Mining and Knowledge Discovery, 20, 388-415. · doi:10.1007/s10618-009-0156-z
[20]	Dekel, O.; Manning, C. D.; Singer, Y., Log-linear models for label ranking, 497-504 (2004)
[21]	Dom, B.; Eiron, I.; Cozzi, A.; Zhang, Y., Graph-based ranking algorithms for e-mail expertise analysis, 42-48 (2003) · doi:10.1145/882082.882093
[22]	Duan, L., Street, W., & Xu, E. (2011). Healthcare information systems: data mining methods in the creation of a clinical recommender system. Enterprise Information Systems, 5(2), 169-181. · doi:10.1080/17517575.2010.541287
[23]	Enright, C.; Madden, M.; Madden, N.; Laffey, J.; Peleg, M. (ed.); Lavrac, N. (ed.); Combi, C. (ed.), Clinical time series data analysis using mathematical models and DBNs, No. 6747, 159-168 (2011), Berlin · doi:10.1007/978-3-642-22218-4_20
[24]	Freund, Y., Iyer, R., Schapire, R. E., & Singer, Y. (2003). An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4, 933-969. · Zbl 1098.68652
[25]	Fürnkranz, J.; Hüllermeier, E., Pairwise preference learning and ranking, 145-156 (2003) · Zbl 1257.68126
[26]	Herbrich, R.; Graepel, T.; Obermayer, K., Support vector learning for ordinal regression, 97-102 (1999) · doi:10.1049/cp:19991091
[27]	Herlocker, J. L.; Konstan, J. A.; Borchers, A.; Riedl, J., An algorithmic framework for performing collaborative filtering, 230-237 (1999) · doi:10.1145/312624.312682
[28]	Hu, Q.; Huang, X.; Melek, W.; Kurian, C.; Yao, Y. (ed.); Sun, R. (ed.); Poggio, T. (ed.); Liu, J. (ed.); Zhong, N. (ed.); Huang, J. (ed.), A time series based method for analyzing and predicting personalized medical data, 288-298 (2010), Berlin · doi:10.1007/978-3-642-15314-3_27
[29]	Hüllermeier, E., Fürnkranz, J., Cheng, W., & Brinker, K. (2008). Label ranking by learning pairwise preferences. Artificial Intelligence, 172(16-17), 1897-1916. · Zbl 1184.68403 · doi:10.1016/j.artint.2008.08.002
[30]	ICCBR (2011). International conference on case-based reasoning (ICCBR) computer cooking contest recipe book. http://liris.cnrs.fr/ccc/ccc2011/.
[31]	Järvelin, K.; Kekäläinen, J., IR evaluation methods for retrieving highly relevant documents, 41-48 (2000) · doi:10.1145/345508.345545
[32]	Joachims, T., Optimizing search engines using clickthrough data, 133-142 (2002)
[33]	Lebanon, G.; Lafferty, J., Cranking: combining rankings using conditional probability models on permutations, 363-370 (2002)
[34]	Linden, G., Smith, B., & York, J. (2003). Amazon.com recommendations: item-to-item collaborative filtering. IEEE Internet Computing, 7(1), 76-80. · doi:10.1109/MIC.2003.1167344
[35]	McCormick, T. H., Rudin, C., & Madigan, D. (2012). Bayesian hierarchical modeling for predicting medical conditions. Annals of Applied Statistics, 6(2), 652-668. · Zbl 1243.62036 · doi:10.1214/11-AOAS522
[36]	McSherry, F.; Najork, M., Computing information retrieval performance measures efficiently in the presence of tied scores, 414-421 (2008)
[37]	Pal, C.; McCallum, A., Cc prediction with graphical models (2006)
[38]	Radlinski, F.; Kleinberg, R.; Joachims, T., Learning diverse rankings with multi-armed bandits, 784-791 (2008) · doi:10.1145/1390156.1390255
[39]	Roth, M.; Ben-David, A.; Deutscher, D.; Flysher, G.; Horn, I.; Leichtberg, A.; Leiser, N.; Matias, Y.; Merom, R., Suggesting friends using the implicit social graph, 233-242 (2010) · doi:10.1145/1835804.1835836
[40]	Rudin, C. (2009). The P-norm Push: a simple convex ranking algorithm that concentrates at the top of the list. Journal of Machine Learning Research, 10, 2233-2271. · Zbl 1235.68185
[41]	Rudin, C., & Schapire, R. E. (2009). Margin-based ranking and an equivalence between AdaBoost and RankBoost. Journal of Machine Learning Research, 10, 2193-2232. · Zbl 1235.68186
[42]	Rudin, C.; Letham, B.; Salleb-Aouissi, A.; Kogan, E.; Madigan, D., Sequential event prediction with association rules, 615-634 (2011)
[43]	Rudin, C., Letham, B., Kogan, E., & Madigan, D. (2012) A learning theory framework for sequential event prediction and association rules. (Working paper OR 394-12). Massachusetts Institute of Technology, Operations Research Center. · Zbl 1317.68184
[44]	Sarwar, B.; Karypis, G.; Konstan, J.; Riedl, J., Item-based collaborative filtering recommendation algorithms, 285-295 (2001)
[45]	Senecal, S., & Nantel, J. (2004). The influence of online product recommendations on consumers’ online choices. Journal of Retailing, 80, 159-169. · doi:10.1016/j.jretai.2004.04.001
[46]	Shalev-Shwartz, S., & Singer, Y. (2006). Efficient learning of label ranking by soft projections onto polyhedra. Journal of Machine Learning Research, 7, 1567-1599. · Zbl 1222.68302
[47]	Shani, G., Heckerman, D., & Brafman, R. I. (2005). An MDP-based recommender system. Journal of Machine Learning Research, 6, 1265-1295. · Zbl 1222.68406
[48]	Shmueli, G. (2010). To explain or to predict? Statistical Science, 25(3), 289-310. · Zbl 1329.62045 · doi:10.1214/10-STS330
[49]	Stahl, F., & Johansson, R. (2009). Diabetes mellitus modeling and short-term prediction based on blood glucose measurements. Mathematical Biosciences, 217(2), 101-117. · Zbl 1157.92021 · doi:10.1016/j.mbs.2008.10.008
[50]	Yan, R.; Hauptmann, A. G., Efficient margin-based rank learning algorithms for information retrieval, 113-122 (2006) · doi:10.1007/11788034_12
[51]	Yue, Y.; Finley, T.; Radlinski, F.; Joachims, T., A support vector method for optimizing average precision, 271-278 (2007) · doi:10.1145/1277741.1277790
[52]	Zhu, C., Byrd, R. H., Lu, P., & Nocedal, J. (1997). Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Transactions on Mathematical Software, 23(4), 550-560. · Zbl 0912.65057 · doi:10.1145/279232.279236

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.