×

How training on multiple time slices improves performance in churn prediction. (English) Zbl 1487.90409

Summary: Customer churn prediction models using machine learning classification have been developed predominantly by training and testing on one time slice of data. We train models on multiple time slices of data and refer to this approach as multi-slicing. Our results show that given the same time frame of data, multi-slicing significantly improves churn prediction performance compared to training on the entire data set as one time slice. We demonstrate that besides an increased training set size, the improvement is driven by training on samples from different time slices. For data from a convenience wholesaler, we show that multi-slicing addresses the rarity of churn samples and the risk of overfitting to the distinctive situation in a single training time slice. Multi-slicing makes a model more generalizable, which is particularly relevant whenever conditions change or fluctuate over time. We also discuss how to choose the number of time slices.

MSC:

90B60 Marketing, advertising
68T05 Learning and adaptive systems in artificial intelligence

Software:

SMOTE
Full Text: DOI

References:

[1] Anderson, E.; Weitz, B., Determinants of continuity in conventional industrial channel dyads, Marketing Science, 8, 4, 310-323 (1989)
[2] Ballings, M.; Van den Poel, D., Customer event history for churn prediction: How long is long enough?, Expert Systems with Applications, 39, 18, 13517-13522 (2012)
[3] Ballings, M.; Van Den Poel, D., CRM in social media: Predicting increases in Facebook usage frequency, European Journal of Operational Research, 244, 1, 248-260 (2015) · Zbl 1346.90412
[4] Bose, I.; Chen, X., Quantitative models for direct marketing: A review from systems perspective, European Journal of Operational Research, 195, 1, 1-16 (2009) · Zbl 1159.90434
[5] Breiman, L., Random forests, Machine Learning, 45, 1, 5-32 (2001) · Zbl 1007.68152
[6] Buckinx, W.; Van den Poel, D., Customer base analysis: Partial defection of behaviourally loyal clients in a non-contractual FMCG retail setting, European Journal of Operational Research, 164, 1, 252-268 (2005) · Zbl 1132.90349
[7] Burez, J.; Van den Poel, D., CRM at a pay-TV company: Using analytical models to reduce customer attrition by targeted marketing for subscription services, Expert Systems with Applications, 32, 2, 277-288 (2007)
[8] Burez, J.; Van den Poel, D., Separating financial from commercial customer churn: A modeling step towards resolving the conflict between the sales and credit department, Expert Systems with Applications, 35, 1-2, 497-514 (2008)
[9] Burez, J.; Van den Poel, D., Handling class imbalance in customer churn prediction, Expert Systems with Applications, 36, 3, 4626-4636 (2009)
[10] Chawla, N. V.; Bowyer, K. W.; Hall, L. O.; Kegelmeyer, W. P., SMOTE: Synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, 16, 321-357 (2002) · Zbl 0994.68128
[11] Chen, C.; Liaw, A.; Brieman, L., Using random forest to learn imbalanced data: Technical Report No. 666. University of California, Berkley, Using Random Forest to Learn Imbalanced Data, 110, 1-12, 12 (2004)
[12] Chen, K.; Hu, Y.-H.; Hsieh, Y.-C., Predicting customer churn from valuable B2B customers in the logistics industry: A case study, Information Systems and e-Business Management, 13, 3, 475-494 (2015)
[13] Chen, Z.-Y.; Fan, Z.-P.; Sun, M., A hierarchical multiple kernel support vector machine for customer churn prediction using longitudinal behavioral data, European Journal of Operational Research, 223, 2, 461-472 (2012) · Zbl 1292.68131
[14] Coussement, K.; De Bock, K. W., Customer churn prediction in the online gambling industry: The beneficial effect of ensemble learning, Journal of Business Research, 66, 9, 1629-1636 (2013)
[15] Coussement, K.; Lessmann, S.; Verstraeten, G., A comparative analysis of data preparation algorithms for customer churn prediction: A case study in the telecommunication industry, Decision Support Systems, 95, 27-36 (2017)
[16] Coussement, K.; Van den Poel, D., Churn prediction in subscription services: An application of support vector machines while comparing two parameter-selection techniques, Expert Systems with Applications, 34, 1, 313-327 (2008)
[17] Coussement, K.; Van den Poel, D., Improving customer attrition prediction by integrating emotions from client/company interaction emails and evaluating multiple classifiers, Expert Systems with Applications, 36, 3, 6127-6134 (2009)
[18] De Bock, K. W.; Van Den Poel, D., Reconciling performance and interpretability in customer churn prediction using ensemble learning based on generalized additive models, Expert Systems with Applications, 39, 8, 6816-6826 (2012)
[19] De Caigny, A.; Coussement, K.; De Bock, K. W., A new hybrid classification algorithm for customer churn prediction based on logistic regression and decision trees, European Journal of Operational Research, 269, 2, 760-772 (2018) · Zbl 1388.90061
[20] Demšar, J., Statistical comparisons of classifiers over multiple data sets, Journal of Machine Learning Research, 7, 1-30 (2006) · Zbl 1222.68184
[21] Dietterich, T. G., Approximate statistical tests for comparing supervised classification learning algorithms, Neural Computation, 10, 7, 1895-1923 (1998)
[22] Egan, J. P. (1975). Signal detection theory and ROC-analysis. In Series in cognition and perception. Academic press.
[23] Farquad, M. A.; Ravi, V.; Raju, S. B., Churn prediction using comprehensible support vector machine: An analytical CRM application, Applied Soft Computing Journal, 19, 31-40 (2014)
[24] García, D. L.; Nebot, À.; Vellido, A., Intelligent data analysis approaches to churn as a business problem: A survey, Knowledge and information systems, 51, 3, 719-774 (2017)
[25] Glady, N.; Baesens, B.; Croux, C., Modeling churn using customer lifetime value, European Journal of Operational Research, 197, 1, 402-411 (2009) · Zbl 1157.91396
[26] Gür Ali, Ö.; Arıtürk, U., Dynamic churn prediction framework with more effective use of rare event data: The case of private banking, Expert Systems with Applications, 41, 17, 7889-7903 (2014)
[27] Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V., Gene selection for cancer classification using support vector machines, Machine Learning, 46, 1-3, 389-422 (2002) · Zbl 0998.68111
[28] Japkowicz, N.; Stephen, S., The class imbalance problem: A systematic study, Intelligent Data Analysis, 6, 5, 429-449 (2002) · Zbl 1085.68628
[29] Kohavi, R.; John, G. H., Wrappers for feature subset selection, Artificial Intelligence, 97, 1-2, 273-324 (1997) · Zbl 0904.68143
[30] Kumar, D. A.; Ravi, V., Predicting credit card customer churn in banks using data mining, International Journal of Data Analysis Techniques and Strategies, 1, 1, 4-28 (2008)
[31] Larivière, B.; Van Den Poel, D., Investigating the role of product features in preventing customer churn, by using survival analysis and choice modeling: The case of financial services, Expert Systems with Applications, 27, 2, 277-285 (2004)
[32] Larivière, B.; Van den Poel, D., Predicting customer retention and profitability by using random forests and regression forests techniques, Expert Systems with Applications, 29, 2, 472-484 (2005)
[33] Lessmann, S.; Voß, S., A reference model for customer-centric data mining with support vector machines, European Journal of Operational Research, 199, 2, 520-530 (2009) · Zbl 1176.90340
[34] Leung, H. C., & Chung, W. (2020). A Dynamic Classification Approach to Churn Prediction in Banking Industry. In Amcis 2020 proceedings data science and analytics for decision support (sigdsa). Association for Information Systems.
[35] Miguéis, V.; Camanho, A.; Falcão e. Cunha, J., Customer attrition in retailing: An application of Multivariate Adaptive Regression Splines, Expert Systems with Applications, 40, 16, 6225-6232 (2013)
[36] Miguéis, V.; Van den Poel, D.; Camanho, A.; Falcão e. Cunha, J., Modeling partial customer churn: On the value of first product-category purchase sequences, Expert Systems with Applications, 39, 12, 11250-11256 (2012)
[37] Neslin, S. A.; Gupta, S.; Kamakura, W.; Lu, J.; Mason, C. H., Defection detection: Measuring and understanding the predictive accuracy of customer churn models, Journal of Marketing Research, 43, 2, 204-211 (2006)
[38] Óskarsdóttir, M.; Bravo, C.; Verbeke, W.; Sarraute, C.; Baesens, B.; Vanthienen, J., Social network analytics for churn prediction in telco: Model building, evaluation and network architecture, Expert Systems with Applications, 85, 204-220 (2017)
[39] Piatetsky-Shapiro, G.; Masand, B., Estimating campaign benefits and modeling lift, Proceedings of the fifth ACM SIGKDDinternational conference on knowledge discovery and data mining - KDD ’99, 185-193 (1999), ACM Press: ACM Press New York, New York, USA
[40] Reichheld, F. F., Learning from customer defections, Harvard business review, 74, 2, 56-69 (1996)
[41] Risselada, H.; Verhoef, P. C.; Bijmolt, T. H., Staying power of churn prediction models, Journal of Interactive Marketing, 24, 3, 198-208 (2010)
[42] Seppälä, T.; Thuy, L., A combination of multi-period training data and ensemble methods to improve churn classification of housing loan customers, Proceedings of the 2nd international conference on advanced research methods and analytics (CARMA 2018), 141-144 (2018), Universidad Politècnica de València
[43] Somol, P.; Baesens, B.; Pudil, P.; Vanthienen, J., Filter- versus wrapper-based feature selection for credit scoring, International Journal of Intelligent Systems, 20, 10, 985-999 (2005)
[44] Tamaddoni Jahromi, A.; Stakhovych, S.; Ewing, M., Managing B2B customer churn, retention and profitability, Industrial Marketing Management, 43, 7, 1258-1268 (2014)
[45] Vapnik, V. N., The nature of statistical learning theory (1995), Springer: Springer New York, NY · Zbl 0833.62008
[46] Verbeke, W.; Dejaeger, K.; Martens, D.; Hur, J.; Baesens, B., New insights into churn prediction in the telecommunication sector: A profit driven data mining approach, European Journal of Operational Research, 218, 1, 211-229 (2012)
[47] Verbeke, W.; Martens, D.; Mues, C.; Baesens, B., Building comprehensible customer churn prediction models with advanced rule induction techniques, Expert Systems with Applications, 38, 3, 2354-2364 (2011)
[48] Wei, C.-P.; Chiu, I.-T., Turning telecommunications call details to churn prediction: A data mining approach, Expert Systems with Applications, 23, 2, 103-112 (2002)
[49] Weiss, G. M., Mining with rarity: A unifying framework, SIGKDD Explorations, 6, 1, 7-19 (2004)
[50] Wilcoxon, F., Individual comparisons by ranking methods, Biometrics Bulletin 1, 6, 80-83 (1945)
[51] Zahavi, J.; Levin, N., Applying neural computing to target marketing, Journal of Direct Marketing, 11, 1, 5-22 (1997)
[52] Zeithaml, V. A.; Berry, L. L.; Parasuraman, A., The behavioral consequences of service quality, Source: Journal of Marketing, 60, 2, 31-46 (1996)
[53] Zhu, B.; Baesens, B.; Backiel, A.; Vanden Broucke, S. K., Benchmarking sampling techniques for imbalance learning in churn prediction, Journal of the Operational Research Society, 69, 1, 49-65 (2017)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.