Document Zbl 07898271

Mena, Gary; Coussement, Kristof; De Bock, Koen W.; De Caigny, Arno; Lessmann, Stefan

Exploiting time-varying RFM measures for customer churn prediction with deep neural networks. (English) Zbl 07898271

Ann. Oper. Res. 339, No. 1-2, 765-787 (2024).

Summary: Deep neural network (DNN) architectures such as recurrent neural networks and transformers display outstanding performance in modeling sequential unstructured data. However, little is known about their merit to model customer churn with time-varying data. The paper provides a comprehensive evaluation of the ability of recurrent neural networks and transformers for customer churn prediction (CCP) using time-varying behavioral features in the form of recency, frequency, and monetary value (RFM). RFM variables are the backbone of CCP and, more generally, customer behavior forecasting. We examine alternative strategies for integrating time-varying and non-variant customer features in one network architecture. In this scope, we also assess hybrid approaches that incorporate the outputs of DNNs in conventional CCP models. Using a comprehensive panel data set from a large financial services company, we find recurrent neural networks to outperform transformer architectures when focusing on time-varying RFM features. This finding is confirmed when time-invariant customer features are included, independent of the specific form of feature integration. Finally, we find no statistical evidence that hybrid approaches (based on regularized logistic regression and extreme gradient boosting) improve predictive performance – highlighting that DNNs and especially recurrent neural networks are suitable standalone classifiers for CCP using time-varying RFM measures.

Cited in 1 Document

MSC:

68Txx	Artificial intelligence
90Bxx	Operations research and management science
62Hxx	Multivariate analysis

Keywords:

financial services; customer churn; deep learning; panel data; time-varying features; RFM; recurrent neural networks; transformers; attention; GRU; LSTM

Software:

XGBoost; ElemStatLearn

Cite Review PDF

Full Text: DOI

OA License

References:

[1]	Bouckaert, R. R., & Frank, E. (2004). Evaluating the replicability of significance tests for comparing learning algorithms—advances in knowledge discovery and data mining. In H. Dai, R. Srikant, & C. Zhang (Eds.), Proceedings of the Pacific-Asia conference on knowledge discovery and data mining (PAKDD) 2004 (pp. 3-12). Springer.
[2]	Chaudhari, S.; Mithal, V.; Polatkan, G.; Ramanath, R., An attentive survey of attention models, ACM Transactions on Intelligent Systems and Technology, 12, 5, 1-32, 2021 · doi:10.1145/3465055
[3]	Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 785-794). Association for Computing Machinery.
[4]	Chen, ZY; Fan, ZP; Sun, M., A hierarchical multiple kernel support vector machine for customer churn prediction using longitudinal behavioral data, European Journal of Operational Research, 223, 2, 461-472, 2012 · Zbl 1292.68131 · doi:10.1016/j.ejor.2012.06.040
[5]	Cho, K., van Merriënboer, B., Gulcehre, C., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of theconference on empirical methods in natural language processing (EMNLP 2014).
[6]	Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. In Proceedings of theNIPS 2014 workshop on deep learning, December 2014.
[7]	De Caigny, A.; Coussement, K.; De Bock, KW, A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees, European Journal of Operational Research, 269, 2, 760-772, 2018 · Zbl 1388.90061 · doi:10.1016/j.ejor.2018.02.009
[8]	De Caigny, A.; Coussement, K.; De Bock, KW; Lessmann, S., Incorporating textual information in customer churn prediction models based on a convolutional neural network, International Journal of Forecasting, 36, 4, 1563-1578, 2020 · doi:10.1016/j.ijforecast.2019.03.029
[9]	Galassi, A.; Lippi, M.; Torroni, P., Attention in natural language processing, IEEE Transactions on Neural Networks and Learning Systems, 32, 10, 4291-4308, 2021 · doi:10.1109/TNNLS.2020.3019893
[10]	Gattermann-Itschert, T.; Thonemann, UW, How training on multiple time slices improves performance in churn prediction, European Journal of Operational Research, 295, 664-674, 2021 · Zbl 1487.90409 · doi:10.1016/j.ejor.2021.05.035
[11]	Goodfellow, I.; Bengio, Y.; Courville, A., Deep Learning, 2016, MIT Press · Zbl 1373.68009
[12]	Gunnarsson, BR; Vanden Broucke, S.; Baesens, B.; Óskarsdóttir, M.; Lemahieu, W., Deep learning for credit scoring: Do or don’t?, European Journal of Operational Research, 295, 1, 292-305, 2021 · Zbl 1487.91147 · doi:10.1016/j.ejor.2021.03.006
[13]	Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer. · Zbl 1273.62005
[14]	Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780.
[15]	Janssens, B., Bogaert, M., Bagué, A., & Van den Poel, D. (2022). B2Boost: Instance-dependent profit-driven modelling of B2B churn. Annals of Operations Research, 1, 1-27.
[16]	Koehn, D.; Lessmann, S.; Schaal, M., Predicting online shopping behaviour from clickstream data using deep learning, Expert Systems with Applications, 150, 2020 · doi:10.1016/j.eswa.2020.113342
[17]	Li, J., A two-step rejection procedure for testing multiple hypotheses, Journal of Statistical Planning and Inference, 138, 6, 1521-1527, 2008 · Zbl 1131.62067 · doi:10.1016/j.jspi.2007.04.032
[18]	Liu, X., Xie, M., Wen, X., Chen, R., Ge, Y., Duffield, N., & Wang, N. (2018). A semi-supervised and inductive embedding model for churn prediction of large-scale mobile games. In Proceedings of the 2018 IEEE international conference on data mining (ICDM) (pp. 277-286).
[19]	Luong, T., Pham, H., & Manning, C. D. (2015). Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 conference on empirical methods in natural language processing (pp. 1412-1421). Association for Computational Linguistics.
[20]	McCarthy, DM; Fader, PS; Hardie, BGS, Valuing subscription-based businesses using publicly disclosed customer data, Journal of Marketing, 81, 1, 17-35, 2017 · doi:10.1509/jm.15.0519
[21]	Óskarsdóttir, M.; Bravo, C.; Verbeke, W.; Sarraute, C.; Baesens, B.; Vanthienen, J., Social network analytics for churn prediction in telco: Model building, evaluation and network architecture, Expert Systems with Applications, 85, 204-220, 2017 · doi:10.1016/j.eswa.2017.05.028
[22]	Qi, J.; Zhang, L.; Liu, Y.; Li, L.; Zhou, Y.; Shen, Y., ADTreesLogit model for customer churn prediction, Annals of Operations Research, 168, 247-265, 2009 · Zbl 1179.90037 · doi:10.1007/s10479-008-0400-8
[23]	Risselada, H.; Verhoef, PC; Bijmolt, THA, Staying power of churn prediction models, Journal of Interactive Marketing, 24, 198-208, 2010 · doi:10.1016/j.intmar.2010.04.002
[24]	Rush, A. (2018). The annotated transformer. In Proceedings of theworkshop for NLP open source software (NLP-OSS) (pp. 52-60). Association for Computational Linguistics.
[25]	Rust, RT; Lemon, KN; Zeithaml, VA, Return on marketing: using customer equity to focus marketing strategy, Journal of Marketing, 68, 1, 109-127, 2004 · doi:10.1509/jmkg.68.1.109.24030
[26]	Schweidel, DA; Park, YH; Jamal, Z., A multiactivity latent attrition model for customer base analysis, Marketing Science, 33, 2, 273-286, 2014 · doi:10.1287/mksc.2013.0832
[27]	Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Proceedings of the27th international conference on neural information processing systems—volume 2 (pp. 3104-3112). MIT Press.
[28]	Tan, F., Wei, Z., He, J., Wu, X., Peng, B., Liu, H., & Yan, Z. (2018). A blended deep learning approach for predicting user intended actions. In Proceedings of the 2018 IEEE international conference on data mining (ICDM) (pp. 487-496).
[29]	Van Nguyen, T.; Zhou, L.; Chong, AYL; Li, B.; Pu, X., Predicting customer demand for remanufactured products: A data-mining approach, European Journal of Operational Research, 281, 3, 543-558, 2020 · doi:10.1016/j.ejor.2019.08.015
[30]	Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., et al. (2017). Attention is all you need. In I. Guyon, U. V Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Proceedings of the Advances in neural information processing systems (Vol. 30). Curran Associates, Inc.
[31]	Verbeke, W.; Dejaeger, K.; Martens, D.; Hur, J.; Baesens, B., New insights into churn prediction in the telecommunication sector: A profit driven data mining approach, European Journal of Operational Research, 218, 1, 211-229, 2012 · doi:10.1016/j.ejor.2011.09.031
[32]	Verbraken, T.; Verbeke, W.; Baesens, B., A novel profit maximizing metric for measuring classification performance of customer churn prediction models, IEEE Transactions on Knowledge and Data Engineering, 25, 5, 961-973, 2013 · doi:10.1109/TKDE.2012.50
[33]	Wangperawong, A., Brun, C., Laudy, O., & Pavasuthipaisit, R. (2016). Churn analysis using deep convolutional neural networks and autoencoders. arXiv.org, stat.ML.
[34]	Wei, CP; Chiu, IT, Turning telecommunications call details to churn prediction: A data mining approach, Expert Systems with Applications, 23, 2, 103-112, 2002 · doi:10.1016/S0957-4174(02)00030-1
[35]	Wu, Z., Jing, L., Wu, B., & Jin, L. (2022). A PCA-AdaBoost model for E-commerce customer churn prediction. Annals of Operations Research, 1, 1-18.
[36]	Yang, C., Shi, X., Jie, L., & Han, J. (2018). I know you’ll be back: Interpretable new user clustering and churn prediction on a mobile social application. In Proceedings of the proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 914-922). Association for Computing Machinery.
[37]	Zaratiegui, J., Montoro, A., & Castanedo, F. (2015). Performing highly accurate predictions through convolutional networks for actual telecommunication challenges. In Proceedings of theinternational conference on computer vision and pattern recognition (Vol. abs/1511.0, pp. 1-8).
[38]	Zhang, Y.; Bradlow, ET; Small, DS, Predicting customer value using clumpiness: From RFM to RFMC, Marketing Science, 34, 2, 195-208, 2015 · doi:10.1287/mksc.2014.0873
[39]	Zhou, J., Yan, J., Yang, L., Wang, M., & Xia, P. (2019). Customer churn prediction model based on LSTM and CNN in music streaming. In Proceedings of the 2019 international conference on advanced electrical, mechatronics and computer engineering (AEMCE 2019) (pp. 254-261).

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.