Document Zbl 1464.62533

On algorithmic and modeling approaches to imputation in large data sets. (English) Zbl 1464.62533

Stat. Sin. 30, No. 4, 1685-1696 (2020).

Summary: The machine learning and statistical modeling cultures provide contrasting approaches to statistical analysis. In an article in this journal, W.-Y. Loh et al. [Stat. Sin. 29, No. 1, 431–453 (2019; Zbl 1412.62080)] compare these approaches in the setting of imputation of large data sets, recommending machine-learning methods. All the compared methods make assumptions, and I note that these assumptions receive more critical assessment for the model-based approaches than for the tree-based machine-learning methods. I discuss in particular the assumptions about the missing-data mechanism implied by the differing approaches. I question the extent to which general conclusions can be drawn from their simulation study, given the relatively strong performance of the method that discards the incomplete cases, and the limited exploration of the relevant design space.

Cited in 1 Document

MSC:

62R07	Statistical aspects of big data and data science
62H30	Classification and discrimination; cluster analysis (statistical aspects)
62D10	Missing data
68T05	Learning and adaptive systems in artificial intelligence

Keywords:

imputation; missing data; machine learning; nonresponse weighting; tree and forest methods

Citations:

Zbl 1412.62080

Cite Review PDF

Full Text: DOI Link