×

Machine learning with high-cardinality categorical features in actuarial applications. (English) Zbl 07866374

The study develops and extends a generalised linear mixed model in a deep learning framework and provides a novel generalised linear mixed model neural network (“GLMMNet”) approach to model high-cardinality categorical features. This model has a double benefit; first, by combining a deep neural network with the GLMM structure, it takes advantage of the predictive power of deep learning models as well as of the statistical strength of GLMMs. Moreover, the model is extremely flexible and suitable for actuarial applications. In fact, the GLMMNet is applied to a real insurance dataset and its performance is compared to other models considered throughout the paper.

MSC:

91G05 Actuarial mathematics
68T07 Artificial neural networks and deep learning

References:

[1] Al-Mudafer, M.T., Avanzi, B., Taylor, G., Wong, B., 2022. Stochastic loss reserving with mixture density neural networks. Insurance: Mathematics and Economics105, 144-174. https://www.sciencedirect.com/science/article/pii/S0167668722000373, DOI: doi:10.1016/j.insmatheco.2022.03.010. · Zbl 1492.91270
[2] Antonio, K., Beirlant, J., 2007. Actuarial statistics with generalized linear mixed models. Insurance: Mathematics and Economics40, 58-76. https://www.sciencedirect.com/science/article/pii/S0167668706000552, DOI: doi:10.1016/j.insmatheco.2006.02.013. · Zbl 1104.62111
[3] Antonio, K., Zhang, Y., 2014. Linear mixed models, in: Frees, E.W., Meyers, G., Derrig, R.A. (Eds.), Predictive Modeling Applications in Actuarial Science, Volume I: Predictive Modeling Techniques. Cambridge University Press, Cambridge. volume 1 of International Series on Actuarial Science, pp. 182-216. https://www.cambridge.org/core/books/predictive-modeling-applications-in-actuarial-science/linear-mixed-models/91FF971A2C418510F2DB8AED3368FF5B, DOI: doi:10.1017/CBO9781139342674.008.
[4] Blei, D.M., Kucukelbir, A., Mcauliffe, J.D., 2017. Variational inference: A review for statisticians. Journal of the American Statistical Association112, 859-877. DOI: doi:10.1080/01621459.2017.1285773. publisher: Taylor & Francis.
[5] Blundell, C., Cornebise, J., Kavukcuoglu, K., Wierstra, D., 2015. Weight uncertainty in neural networks, in: Proceedings of the 32nd International Conference on Machine Learning, PMLR. pp. 1613-1622. https://proceedings.mlr.press/v37/blundell15.pdf.
[6] Bürkner, P.C., 2017. brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software 80, 1-28. DOI: doi:10.18637/jss.v080.i01.
[7] Casella, G., 1985. An introduction to empirical bayes data analysis. The American Statistician39, 83-87.
[8] Cerda, P., Varoquaux, G., Kégl, B., 2018. Similarity encoding for learning with dirty categorical variables. Machine Learning107, 1477-1494. http://link.springer.com/10.1007/s10994-018-5724-2, DOI: doi:10.1007/s10994-018-5724-2.
[9] Chollet, F., et al., 2015. Keras. https://keras.io.
[10] Delong, Ł., Kozak, A., 2021. The use of autoencoders for training neural networks with mixed categorical and numerical features. SSRN. https://papers.ssrn.com/abstract=3952470, DOI: doi:10.2139/ssrn.3952470.
[11] Delong, Ł., Lindholm, M., Wüthrich, M.V., 2021. Gamma mixture density networks and their application to modelling insurance claim amounts. Insurance: Mathematics and Economics, S0167668721001232 https://linkinghub.elsevier.com/retrieve/pii/S0167668721001232, DOI: doi:10.1016/j.insmatheco.2021.08.003. · Zbl 1475.91294
[12] Denuit, M., Hainaut, D., Trufin, J., 2019. Effective Statistical Learning Methods for Actuaries I: GLMs and Extensions. Springer Actuarial, Springer International Publishing, Cham. http://link.springer.com/10.1007/978-3-030-25820-7, DOI: doi:10.1007/978-3-030-25820-7. · Zbl 1426.62003
[13] Embrechts, P., Wüthrich, M., 2022. Recent challenges in actuarial science. Annual Review of Statistics and Its Application 9. https://www.annualreviews.org/doi/10.1146/annurev-statistics-040120-030244, DOI: doi:10.1146/annurev-statistics-040120-030244.
[14] Ferrario, A., Noll, A., Wüthrich, M., 2020. Insights from inside neural networks. SSRN. https://papers.ssrn.com/abstract=3226852, DOI: doi:10.2139/ssrn.3226852.
[15] Frees, E.W., 2014. Longitudinal and panel data models, in: Frees, E.W., Meyers, G., Derrig, R.A. (Eds.), Predictive Modeling Applications in Actuarial Science, Volume I: Predictive Modeling Techniques. Cambridge University Press, Cambridge. volume 1 of International Series on Actuarial Science, pp. 167-181. https://www.cambridge.org/core/books/predictive-modeling-applications-in-actuarial-science/longitudinal-and-panel-data-models/FA9525C9E531966C9DD65A79C06B7888, DOI: doi:10.1017/CBO9781139342674.007.
[16] Friedman, J.H., 1991. Multivariate adaptive regression splines. The Annals of Statistics19, 1-67. · Zbl 0765.62064
[17] Gelman, A., 2020. Prior choice recommendations. https://github.com/stan-dev/stan.
[18] Gelman, A., Hill, J., 2007. Data analysis using regression and multilevel/hierarchical models. Analytical methods for social research, Cambridge University Press, Cambridge ; New York. OCLC: ocm67375137.
[19] Glorot, X., Bengio, Y., 2010. Understanding the difficulty of training deep feedforward neural networks, in: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings. pp. 249-256. https://proceedings.mlr.press/v9/glorot10a.html. iSSN: 1938-7228.
[20] Gneiting, T., Katzfuss, M., 2014. Probabilistic Forecasting. Annual Review of Statistics and Its Application1, 125-151. DOI: doi:10.1146/annurev-statistics-062713-085831.
[21] Gneiting, T., Raftery, A.E., 2007. Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association102, 359-378. DOI: doi:10.1198/016214506000001437. publisher: Taylor & Francis. · Zbl 1284.62093
[22] Goodfellow, I., Bengio, Y., Courville, A., 2016. Deep Learning. MIT Press, Cambridge, MA. Google-Books-ID: omivDQAAQBAJ. · Zbl 1373.68009
[23] Guiahi, F., 2017. Applying graphical models to automobile insurance data. Variance11, 23-44.
[24] Guo, C., Berkhahn, F., 2016. Entity embeddings of categorical variables. arXiv:1604.06737 [cs] http://arxiv.org/abs/1604.06737. arXiv: 1604.06737.
[25] Hainaut, D., Trufin, J., Denuit, M., 2022. Response versus gradient boosting trees, glms and neural networks under tweedie loss and log-link. Scandinavian Actuarial Journal, 1-26. DOI: doi:10.1080/03461238.2022.2037016. publisher: Taylor & Francis.
[26] Hajjem, A., Bellavance, F., Larocque, D., 2014. Mixed-effects random forest for clustered data. Journal of Statistical Computation and Simulation84, 1313-1328. DOI: doi:10.1080/00949655.2012.741599. publisher: Taylor & Francis. · Zbl 1453.62543
[27] Hastie, T., Tibshirani, R., Friedman, J.H., 2009. The elements of statistical learning: data mining, inference, and prediction. Springer series in statistics. 2nd ed ed., Springer, New York, NY. · Zbl 1273.62005
[28] Henckaerts, R., Côté, M., Antonio, K., Verbelen, R., 2021. Boosting insights in insurance tariff plans with tree-based machine learning methods. North American Actuarial Journal25, 255-285. DOI: doi:10.1080/10920277.2020.1745656. publisher: Routledge. · Zbl 1475.91306
[29] Jordan, A., Krüger, F., Lerch, S., 2019. Evaluating probabilistic forecasts with scoringrules. Journal of Statistical Software 90. http://www.jstatsoft.org/v90/i12/, DOI: doi:10.18637/jss.v090.i12.
[30] Jospin, L.V., Buntine, W., Boussaid, F., Laga, H., Bennamoun, M., 2022. Hands-on bayesian neural networks – a tutorial for deep learning users. IEEE Computational Intelligence Magazine17, 29-48. http://arxiv.org/abs/2007.06823, DOI: doi:10.1109/MCI.2022.3155327. arXiv:2007.06823 [cs, stat].
[31] Kingma, D.P., Ba, J., 2014. Adam: A method for stochastic optimization. arXiv:1412.6980 [Cs] http://arxiv.org/abs/1412.6980.
[32] Kullback, S., Leibler, R.A., 1951. On information and sufficiency. The Annals of Mathematical Statistics 22, 79-86. https://www.jstor.org/stable/2236703. publisher: Institute of Mathematical Statistics. · Zbl 0042.38403
[33] Kuo, K., Richman, R., 2021. Embeddings and attention in predictive modeling. arXiv:2104.03545 [q-fin, stat] http://arxiv.org/abs/2104.03545. arXiv: 2104.03545.
[34] Kuss, M., Pfingsten, T., Csato, L., Rasmussen, C., 2005. Approximate Inference for Robust Gaussian Process Regression. Technical Report 136. Max Planck Institute for Biological Cybernetics, Tübingen, Germany.
[35] Lakshmanan, V., Robinson, S., Munn, M., 2020. Machine learning design patterns: solutions to common challenges in data preparation, model building, and MLOps. First edition ed., Media, O’Reilly, Sebastopol, Ca. OCLC: on1178649818.
[36] Mandel, F., Ghosh, R.P., Barnett, I., 2022. Neural networks for clustered and longitudinal data using mixed effects models. Biometrics. DOI: doi:10.1111/biom.13615. · Zbl 1522.62194
[37] Mcculloch, C.E., Searle, S.R., 2001. Generalized, linear, and mixed models. Wiley series in probability and statistics. Applied probability and statistics section, John Wiley & Sons, New York.
[38] Neal, R.M., Hinton, G.E., 1998. A view of the em algorithm that justifies incremental, sparse, and other variants, in: Jordan, M.I. (Ed.), Learning in Graphical Models. Springer Netherlands, Dordrecht, pp. 355-368. http://link.springer.com/10.1007/978-94-011-5014-9_12, DOI: doi:10.1007/978-94-011-5014-9_12. · Zbl 0916.62019
[39] Pargent, F., Pfisterer, F., Thomas, J., Bischl, B., 2022. Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics. DOI: doi:10.1007/s00180-022-01207-6. · Zbl 1505.62313
[40] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E., 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research12, 2825-2830. · Zbl 1280.68189
[41] Pettifer, A., Pettifer, J., 2012. A practical guide to commercial insurance pricing, in: Australian Actuaries Institute General Insurance Seminar, Sydney. pp. 1-40. https://www.actuaries.asn.au/Library/Events/GIS/2012/GIS2012PaperAlinaPettifer.pdf.
[42] Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A.V., Gulin, A., 2018. CatBoost: unbiased boosting with categorical features, in: Advances in Neural Information Processing Systems, Curran Associates, Inc. pp. 6639-6649. https://proceedings.neurips.cc/paper/2018/hash/14491b756b3a51daac41c24863285549-Abstract.html.
[43] Richman, R., 2021a. AI in actuarial science – a review of recent advances – part 1. Annals of Actuarial Science15, 207-229. https://www.cambridge.org/core/product/identifier/S1748499520000238/type/journal_article, DOI: doi:10.1017/S1748499520000238.
[44] Richman, R., 2021b. AI in actuarial science – a review of recent advances – part 2. Annals of Actuarial Science15, 230-258. https://www.cambridge.org/core/product/identifier/S174849952000024X/type/journal_article, DOI: doi:10.1017/S174849952000024X.
[45] Richman, R., Wüthrich, M., 2023. High-cardinality categorical covariates in network regressions. SSRN. https://ssrn.com/abstract=4549049, DOI: doi:10.2139/ssrn.4549049.
[46] Richman, R., Wüthrich, M.V., 2021. A neural network extension of the Lee-Carter model to multiple populations. Annals of Actuarial Science15, 346-366. https://www.cambridge.org/core/product/identifier/S1748499519000071/type/journal_article, DOI: doi:10.1017/S1748499519000071.
[47] Shi, P., Shi, K., 2021. Nonlife insurance risk classification using categorical embedding. SSRN. https://papers.ssrn.com/abstract=3777526, DOI: doi:10.2139/ssrn.3777526.
[48] Sigrist, F., 2021. Gaussian process boosting. arXiv:2004.02653 [cs.LG] http://arxiv.org/abs/2004.02653. arXiv:2004.02653.
[49] Sigrist, F., 2022. Latent gaussian model boosting. IEEE Transactions on Pattern Analysis and Machine Intelligence In press, 1-12. DOI: doi:10.1109/TPAMI.2022.3168152. arXiv:2105.08966.
[50] Simchoni, G., Rosset, S., 2022. Integrating random effects in deep neural networks. arXiv:2206.03314 [cs, stat] http://arxiv.org/abs/2206.03314. arXiv:2206.03314.
[51] State of New York, 2022. Assembled workers’ compensation claims: Beginning 2000. https://data.ny.gov/Government-Finance/Assembled-Workers-Compensation-Claims-Beginning-20/jshw-gkgu.
[52] Verbelen, R., 2019. There is (not) enough data! https://www.finity.com.au/publication/commercial-lines-seminar-2019-there-is-not-enough-data.
[53] Wüthrich, M., Buser, C., 2021. Data Analytics for Non-Life Insurance Pricing. Technical Report. Rochester, NY. https://papers.ssrn.com/abstract=2870308, DOI: doi:10.2139/ssrn.2870308.
[54] Wüthrich, M., Merz, M., 2019. Editorial: Yes, we CANN!ASTIN Bulletin49, 1-3. DOI: doi:10.1017/asb.2018.42.
[55] Wüthrich, M.V., Merz, M., 2023. Statistical Foundations of Actuarial Learning and Its Applications. Springer Actuarial, Springer International Publishing, Cham. DOI: doi:10.1007/978-3-031-12409-9. · Zbl 1515.91003
[56] Zhang, C., Bütepage, J., Kjellströööm, H., Mandt, S., 2019. Advances in variational inference. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 2008-2026. DOI: doi:10.1109/TPAMI.2018.2889774. conference Name: IEEE Transactions on Pattern Analysis and Machine Intelligence.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.