×

On algorithmic and modeling approaches to imputation in large data sets. (English) Zbl 1464.62533

Summary: The machine learning and statistical modeling cultures provide contrasting approaches to statistical analysis. In an article in this journal, W.-Y. Loh et al. [Stat. Sin. 29, No. 1, 431–453 (2019; Zbl 1412.62080)] compare these approaches in the setting of imputation of large data sets, recommending machine-learning methods. All the compared methods make assumptions, and I note that these assumptions receive more critical assessment for the model-based approaches than for the tree-based machine-learning methods. I discuss in particular the assumptions about the missing-data mechanism implied by the differing approaches. I question the extent to which general conclusions can be drawn from their simulation study, given the relatively strong performance of the method that discards the incomplete cases, and the limited exploration of the relevant design space.

MSC:

62R07 Statistical aspects of big data and data science
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62D10 Missing data
68T05 Learning and adaptive systems in artificial intelligence

Citations:

Zbl 1412.62080