×

Calibrating prediction regions. (English) Zbl 0706.62085

Summary: Suppose that the variable X to be predicted and the learning sample \(Y_ n\) that was observed have a joint distribution, which depends on an unknown parameter \(\theta\). The parameter \(\theta\) can be finite- or infinite-dimensional. A prediction region \(D_ n\) for X is a random set, depending on \(Y_ n\), that contains X with prescribed probability \(\alpha\). This article studies methods for controlling simultaneously the conditional coverage probability of \(D_ n\), given \(Y_ n\), and the overall (unconditional) coverage probability of \(D_ n\). The basic construction yields a prediction region \(D_ n\), which has the following properties in regular models: Both the conditional and overall coverage probabilities of \(D_ n\) converge to \(\alpha\) as the size n of the learning sample increases. The convergence of the former is in probability. Moreover, the asymptotic distribution of the conditional coverage probability about \(\alpha\) is typically normal; and the overall coverage probability tends to \(\alpha\) at rate \(n^{-1}.\)
Can one reduce the dispersion of the conditional coverage probability about \(\alpha\) and increase the rate at which overall coverage probability converges to \(\alpha\) ? Both issues are addressed. The article establishes a lower bound for the asymptotic dispersion of conditional coverage probability. The article also shows how to calibrate \(D_ n\) so as to make its overall coverage probability converge to \(\alpha\) at the faster rate \(n^{-2}\). This calibration adjustment does not affect the asymptotic distribution or dispersion of the conditional coverage probability, in a first-order analysis. In general, a bootstrap Monte Carlo algorithm accomplishes the calibration of \(D_ n\). In special cases, analytical calibration is possible.

MSC:

62M20 Inference from stochastic processes and prediction
62E20 Asymptotic distribution theory in statistics
Full Text: DOI