Fitting elephants in modern machine learning by statistically consistent interpolation

Textbook wisdom advocates for smooth function fits and implies that interpolation of noisy data should lead to poor generalization. A related heuristic is that fitting parameters should be fewer than measurements (Occam’s razor). Surprisingly, contemporary machine learning approaches, such as deep nets, generalize well, despite interpolating noisy data. This may be understood via statistically consistent interpolation (SCI), that is, data interpolation techniques that generalize optimally for big data. Here, we elucidate SCI using the weighted interpolating nearest neighbours algorithm, which adds singular weight functions to k nearest neighbours. This shows that data interpolation can be a valid machine learning strategy for big data. SCI clarifies the relation between two ways of modelling natural phenomena: the rationalist approach (strong priors) of theoretical physics with few parameters, and the empiricist (weak priors) approach of modern machine learning with more parameters than data. SCI shows that the purely empirical approach can successfully predict. However, data interpolation does not provide theoretical insights, and the training data requirements may be prohibitive. Complex animal brains are between these extremes, with many parameters, but modest training data, and with prior structure encoded in species-specific mesoscale circuitry. Thus, modern machine learning provides a distinct epistemological approach that is different both from physical theories and animal brains.

Fig. 1: The wiNN algorithm applied to linear regression.
Fig. 2: Classification using wiNN, illustrated in 2D.
Fig. 3: SCI placed in context.
Fig. 4: Data-driven ML as a ‘third epistemology’.

This work was supported by the Crick–Clay Professorship (CSHL) and the H. N. Mahabala Chair Professorship (IIT Madras).

Correspondence to Partha P. Mitra.

Competing interests

The author declares no competing interests.

Peer review informationNature Machine Intelligence thanks Samet Oymak and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Mitra, P.P. Fitting elephants in modern machine learning by statistically consistent interpolation. Nat Mach Intell 3, 378–386 (2021).

