Abstract
We consider a weighted local linear estimator based on the inverse selection probability for nonparametric regression with missing covariates at random. The asymptotic distribution of the maximal deviation between the estimator and the true regression function is derived and an asymptotically accurate simultaneous confidence band is constructed. The estimator for the regression function is shown to be oracally efficient in the sense that it is uniformly indistinguishable from that when the selection probabilities are known. Finite sample performance is examined via simulation studies which support our asymptotic theory. The proposed method is demonstrated via an analysis of a data set from the Canada 2010/2011 Youth Student Survey.
Similar content being viewed by others
References
Al Ahmari, T., Alomar, A., Al Beeybe, J., Asiri, N., Al Ajaji, R., Al Masoud, R., Al-Hazzaa, M. (2017). Associations of self-esteem with body mass index and body image among Saudi college-age females. Eating and Weight Disorders-Studies on Anorexia, Bulimia and Obesity, 1, 1–9.
Bickel, P., Rosenblatt, M. (1973). On some global measures of deviations of density function estimates. The Annals of Statistics, 31, 1852–1884.
Billingsley, P. (1968). Convergence of Probability Measures. New York: Wiley.
Bosq, D. (1998). Nonparametric Statistics for Stochastic Processes. New York: Springer-Verlag.
Cai, L., Li, L., Huang, S., Ma, L., Yang, L. (2020). Oracally efficient estimation for dense functional data with holiday effects. Test, 29(1), 282–306. https://doi.org/10.1007/s11749-019-00655-5.
Cai, L., Liu, R., Wang, S., Yang, L. (2019). Simultaneous confidence bands for mean and variance functions based on deterministic design. Statistica Sinica, 29, 505–525.
Cai, T., Low, M., Ma, Z. (2014). Adaptive confidence bands for nonparametric regression functions. Journal of the American Statistical Association, 109, 1054–1070.
Cai, L., Yang, L. (2015). A smooth simultaneous confidence band for conditional variance function. Test, 24, 632–655.
Cao, G., Wang, L., Li, Y., Yang, L. (2016). Oracle efficient confidence envelopes for covariance functions in dense functional data. Statistica Sinica, 26, 359–383.
Cao, G., Yang, L., Todem, D. (2012). Simultaneous inference for the mean function based on dense functional data. Journal of Nonparametric Statistics, 24, 359–377.
Chen, H., Little, R. (1999). Proportional hazards regression with missing covariates. Journal of the American Statistical Association, 94, 896–908.
Chernozhukov, V., Chetverikov, D., Kato, K. (2014). Anti-concentrition and honest, adaptive confidence bands. The Annals of Statistics, 42, 1787–1818.
Claeskens, G., Van Keilegom, I. (2003). Bootstrap confidence bands for regression curves and their derivatives. The Annals of Statistics, 31, 1852–1884.
Eubank, R., Speckman, P. (1993). Confidence bands in nonparametric regression. Journal of the American Statistical Association, 88, 1287–1301.
Fan, J., Gijbels, I. (1996). Local Polynomial Modeling and Its Applications. London: Chapman and Hall.
Fan, J., Zhang, W. (2000). Simultaneous confidence bands and hypothesis testing in varyingcoefficient models. Scandinavian Journal of Statistics, 27, 715–731.
Gu, L., Wang, L., Härdle, W., Yang, L. (2014). A simultaneous confidence corridor for varying coefficient regression with sparse functional data. Test, 23, 806–843.
Gu, L., Yang, L. (2015). Oracally efficient estimation for single-index link function with simultaneous confidence band. Electronic Journal of Statistics, 9, 1540–1561.
Habib, F., Al Fozan, H., Barnawi, N., Al Motairi, W. (2015). Relationship between body mass index, self-esteem and quality of life among adolescent saudi female. Journal of Biology, Agriculture and Healthcare, 5, 2224–3208.
Hall, P. (1991). On convergence rates of suprema. Probability Theory and Related Fields, 89, 447–455.
Hall, P., Titterington, D. (1988). On confidence bands in nonparametric density estimation and regression. Journal of Multivariate Analysis, 27, 228–254.
Härdle, W. (1989). Asymptotic maximal deviation of M-smoothers. Journal of Multivariate Analysis, 29, 163–179.
Härdle, W., Marron, J. (1991). Bootstrap simultaneous error bars for nonparametric regression. The Annals of Statistics, 19, 778–796.
Horvitz, D. G., Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663–685.
Hosmer, D., Lemeshow, S. (2005). Applied Logistic Regression2nd ed. New York: Wiley.
Hsu, C., Long, Q., Li, Y., Jacobs, E. (2014). A nonparametric multiple imputation approach for data with missing covariate values with application to colorectal adenoma data. Journal of Biopharmaceutical Statistics, 24, 634–648.
Ibrahim, J. G., Chen, M.-H., Lipsitz, S. R., Herring, A. H. (2005). Missing-data methods for generalized linear models: A comparative review. Journal of the American Statistical Association, 100, 332–346.
Johnston, G. (1982). Probabilities of maximal deviations for nonparametric regression function estimates. Journal of Multivariate Analysis, 12, 402–414.
Kim, J. K., Shao, J. (2013). Statistical Methods for Handling Incomplete Data. London: Chapman and Hall.
Liang, H., Wang, S., Robins, J., Carroll, R. (2004). Estimation in partially linear models with missing covariates. Journal of the American Statistical Association, 99, 357–367.
Lipsitz, S. R., Ibrahim, J. G., Zhao, L.-P. (1999). A weighted estimating equation for missing covariate data with properties similar to maximum likelihood. Journal of the American Statistical Association, 94, 1147–1160.
Little, R., Rubin, D. (2019). Statistical Analysis with Missing Data3rd ed. New York: Wiley.
Qin, J., Zhang, B., Leung, D. (2009). Empirical likelihood in missing data problems. Journal of the American Statistical Association, 104, 1492–1503.
Robins, J., Rotnitzky, A., Zhao, L. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association, 89, 846–866.
Rosenblatt, M. (1952). Remarks on a multivariate transformation. Annals of the Institute of Statistical Mathematics, 23, 470–472.
Silverman, B. (1986). Density Estimation for Statistics and Data Analysis. London: Chapman and Hall.
Song, Q., Yang, L. (2009). Spline confidence bands for variance function. Journal of Nonparametric Statistics, 21, 589–609.
Tusnády, G. (1977). A remark on the approximation of the sample df in the multidimensional case. Periodica Mathematica Hungarica, 8, 53–55.
Wang, Q. (2009). Statistical estimation in partial linear models with covariate data missing at random. Annals of the Institute of Statistical Mathematics, 61, 47–84.
Wang, J. (2012). Modelling time trend via spline confidence band. Annals of the Institute of Statistical Mathematics, 64, 275–301.
Wang, C., Wang, S., Carroll, R. (1998). Local linear regression for generalized linear models with missing data. Annals of Statistics, 26, 1028–1050.
Wang, C., Wang, S., Zhao, L.-P., Ou, S.-T. (1997). Weighted semiparametric estimation in regression analysis with missing covariate data. Journal of the American Statistical Association, 92, 512–525.
Wang, J., Yang, L. (2009). Polynomial spline confidence bands for regression curves. Statistica Sinica, 19, 325–342.
Zhao, Z., Wu, W. (2008). Confidence bands in nonparametric time series regression. Annals of Statistics, 36, 1854–1878.
Zheng, S., Liu, R., Yang, L., Härdle, W. (2016). Statistical inference for generalized additive models: simultaneous confidence corridors and variable selection. Test, 25, 607–626.
Zheng, S., Yang, L., Hardle, W. (2014). A smooth simultaneous confidence corridor for the mean of sparse functional data. Journal of the American Statistical Association, 109, 661–673.
Zhou, S., Shen, X., Wolfe, D. (1998). Local asymptotics of regression splines and confidence regions. Annals of Statistics, 26, 1760–1782.
Acknowledgements
We would like to thank the Associate Editor and two referees for their helpful comments and suggestions that substantially improved an earlier version of this manuscript. This research was supported in part by the National Natural Science Foundation of China Award NSFC #11901521, #11701403, First Class Discipline of Zhejiang–A (Zhejiang Gongshang University–Statistics), Zhejiang Province Statistical Research Program #20TJQN04, the 2017 Jiangsu Overseas Visiting Scholar Program for University Prominent Young & Middle-aged Teachers and Presidents, Jiangsu Province Key-Discipline Program (Statistics) GD10700118, the National Natural Science Foundation of China (General Program 11871460, Key Program 11331011 and Program for Creative Research Group in China 61621003), a grant from the Key Lab of Random Complex Structure and Data Science, CAS, China, and the Simons Foundation Mathematics and Physical Sciences Program Award #499650.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendices
A. Appendix
We use \(a_{n}\sim b_{n}\) to represent \(\lim _{n\rightarrow \infty }a_{n}/b_{n}=c,\) where c is some nonzero constant. For any function \( \varphi \left( u\right) \) defined on \(\left[ a,b\right] \), let \(\left\| \varphi \left( u\right) \right\| _{\infty }=\left\| \varphi \right\| _{\infty }=\sup _{u\in \left[ a,b\right] }\left| \varphi \left( u\right) \right| \).
A.1 Preliminaries
This section gives some lemmas that are needed in our theoretical development. Their proofs are given in the Supplementary Material.
Lemma 1
(Theorem 1.2 of Bosq (1998)) Let \(\xi _{1},\dots ,\xi _{n}\) be independent random variables with mean 0. If there exists \(c>0\) such that (Cramér’s Conditions)
then for any \(t>0\),
Lemma 2
Under Assumptions (A1)–(A5), for any integer \(l\ge 0\), as \(n\rightarrow \infty \), one has
In the following, we discuss the representations of the weighted estimators \( {\hat{m}}\left( x,\pi \right) \) and \({\hat{m}}\left( x,\hat{\pi }\right) \), and break the errors \({\hat{m}}\left( x,\pi \right) -m\left( x\right) \) and \({\hat{m}}\left( x,\hat{\pi }\right) -m\left( x\right) \) into simpler parts to prove Theorems 1 and 3. Let
and
Then
where \(e_{1}=\left( 0,1\right) ^{T}\). By (4), one then has
To further study \({\hat{m}}\left( x,\hat{\pi }\right) \), let
and
By (5), one then obtains that
Lemma 3
Under Assumptions (A1) and (A3)–(A5), as \(n\rightarrow \infty \), uniformly for all \( x \in \left[ a_{0},b_{0}\right] \), one has
Lemma 4
Under Assumptions (A1)–(A5), as \(n\rightarrow \infty \), uniformly for all \( x \in \left[ a_{0},b_{0}\right] \), one has
and
Lemma 5
Under Assumptions (A1)–(A5), as \(n\rightarrow \infty \), one has
and
A.2 Conditional limiting extreme value distribution of \(V_{n}(x)\)
This section contains the main steps to obtain the conditional extreme value distribution of \(V_{n}(x)=n^{-1}f_{X}^{-1}\left( x\right) \sum \limits _{i=1}^{n}\frac{\delta _{i}}{\pi _{i}}K_{h}\left( X_{i}\!-\!x\right) \varepsilon _{i}\) shown in Theorem 6 at the end of this section which will be used in the total probability formula in the proof of Theorem 2.
The Rosenblatt quantile transformation in Rosenblatt (1952) is adopted with
where \(F_{X|\delta =1}\left( X\right) \) is the conditional distribution function of X given \(\delta =1\) and \(F_{\varepsilon |X,\delta =1}\left( \varepsilon |X\right) \) is the conditional distribution function of \( \varepsilon \) given X and \(\delta =1\). This transformation produces mutually independent uniform random variables \(\left( X^{*},\varepsilon ^{*}\right) \) on \(\left[ 0,1\right] ^{2}\). According to the strong approximation theorem in Tusnady (1977) (Theorem 1), there exists a sequence of two dimensional Brownian bridges \(B_{n}\) such that
where \(Z_{n}\left( x,\varepsilon \right) =n^{1/2}\left\{ F_{n}\left( x,\varepsilon \right) -F_{X,\varepsilon |\delta =1}\left( x,\varepsilon \right) \right\} \) with \(F_{n}\left( x,\varepsilon \right) \) and \( F_{X,\varepsilon |\delta =1}\) \(\left( x,\varepsilon \right) \) representing the empirical and the theoretical distribution of \(\left( X,\varepsilon \right) \) given \(\delta =1\). The transformation and the strong approximation results have been also used in Johnston (1982), Härdle (1989), and Wang and Yang (2009) for constructing SCBs for the nonparametric regression when data are fully observed.
To obtain the distribution of \(\sup _{x\in \left[ a_{0},b_{0} \right] }\left| V_n(x)\right| \) conditional on \(\Delta _{n}=n_{0}\), we will show the following Lemmas 6–8. Here \(\{n_{0}\}\) is a sequence of numbers related to n with \(1\le n_{0}\le n\). By (6) it is clear that there exists a constant \(r>0\) such that \(r \le \Delta _n/n\le 1\) in probability as \(n \rightarrow \infty \). Thus we only need to consider \(n_0 \ge r\times n\). That is, \(n_0\) and n have the same order as \(n \rightarrow \infty \). Therefore, to unify the notation in the following we will use n in the convergence rate.
Meanwhile, due to the i.i.d. assumption of the data, conditional on \(\Delta _{n}=\sum _{i=1}^{n}\delta _{i}=n_{0}\) is equivalent to conditional on the event that there are \(n_0\) elements in \(\varvec{\delta }_{n}=\left( \delta _{1},...,\delta _{n}\right) ^{T}\) that are equal to 1 and the rest \((n-n_0)\) elements are equal to 0. Without loss of generality, let \(\delta _{i}=1\) for \(i=1,\dots ,n_{0}\) and \(\delta _{i}=0\) for \(i=n_{0}+1,\dots ,n\).
Notice that, for \(i=1,\dots ,n\),
Thus, conditional on \(\Delta _{n}=n_{0}\), \(1\le n_{0}\le n\), by symmetry one has
and
Moreover, as discussed above, conditional on \(\Delta _{n}=n_{0}\) one can let \(\delta _{i}=1\) for \(i=1,\dots ,n_{0}\) and \(\delta _{i}=0\) for \(i=n_{0}+1,\dots ,n\) without loss of generality. Then conditional on \(\Delta _{n}=n_{0}\) one can write
Conditional on \(\Delta _{n}=n_{0}\) we now introduce the following standardized stochastic process:
which can be rewritten as
where \(Z_{n_{0}}\left( u,\varepsilon \right) \) is the same as \(Z_{n}\left( u,\varepsilon \right) \) in (13) but with n replaced by \(n_{0}\).
Let \(\kappa _{n}=n^{\theta }\) with \(\frac{2}{3\eta }<\theta <\frac{1}{6}\) where \(\eta >4\) is given in Assumption (A2), which together with Assumption (A5) implies that
Then conditional on \(\Delta _{n}=n_{0}\) one can define the following processes to approximate \(\zeta _{1n_{0}}\left( x\right) \):
where \(s_{n}\left( x\right) =\int _{\left| \varepsilon \right| \le \kappa _{n}}\frac{\varepsilon ^{2}}{\pi ^{2}\left( m\left( x\right) +\varepsilon \right) }f_{X,\varepsilon |\delta =1}\left( x,\varepsilon \right) d\varepsilon ,\) \(B_{n_{0}}\left( T\left( u,\varepsilon \right) \right) \) is the sequence of Brownian bridges in (13) and \(W_{n_{0}} \left( T\left( u,\varepsilon \right) \right) \) is the sequence of Wiener processes satisfying \( B_{n_{0}}\!\left( u,s\right) \!=\!W_{n_{0}}\!\left( u,s\right) \) \( -usW_{n_{0}}\left( 1,1\right) \). Moreover, define
and
where \(W\left( u\right) \) is a two-sided Wiener process on \(\left( -\infty ,+\infty \right) \). Conditional on \(\Delta _{n}=n_{0}\), according to Theorem 3.1 in Bickel and Rosenblatt (1952), one has
\(\forall t\in \mathbb {R}\), as \(n_0 \) (and thus n) \(\rightarrow \infty \). Here \(a_{h},b_{h}\), and \(\lambda \left( K\right) \) are given in Theorem 2.
The proofs of the following Lemmas 6 and 7 are given in the Supplementary Material due to the space limitation.
Lemma 6
Under Assumptions (A1)–(A5), conditional on \( \Delta _{n}=n_{0}\), for an increasing sequence \(\{n_0\}\), as \(n_0 \rightarrow \infty \), one has
Lemma 7
Conditional on \(\Delta _{n}=n_{0}\) for an increasing sequence \(\{n_0\}\), the stochastic processes \(\zeta _{4n_{0}}\left( x\right) \) and \(\zeta _{5n_{0}}\left( x\right) \) have the same asymptotic distribution as \(n_0 \rightarrow \infty \).
Lemmas 6 and 7, expression (17), and Slutsky’s Theorem imply that
\(\forall t\in \mathbb {R}\), as \(n_0 \rightarrow \infty \).
Lemma 8
Under Assumptions (A1)–(A5), conditional on \( \Delta _{n}=n_{0}\) for an increasing sequence \(\{n_0\}\), one has
as \(n_0 \rightarrow \infty \).
Proof of Lemma 8. Define
To prove the lemma, it is sufficient to prove that conditional on \( \Delta _{n}=n_{0}\)
and
as \(n_0 \rightarrow \infty \). In the following, we first show (20). By (18) and the fact that \(b_{h}=O\left( \log ^{1/2}n\right) \), one has \(\sup _{x\in \left[ a_{0},b_{0}\right] }\) \( \left| \zeta _{2n_{0}}\left( x\right) \right| \) \(=O_{p}\left( \log ^{1/2}n\right) \) which with (S.6) in the Supplementary Material implies that
We next prove (19). Notice that
For convenience, we denote
To prove (19), it is sufficient to verify that
By Theorem 15.6 in Billingsley (1968), it suffices to show: (i) conditional on \(\Delta _{n}=n_{0}\), \(\rightarrow 0\) in probability for any given \(x\in \left[ a_{0},b_{0}\right] \) and (ii) the tightness of conditional on \(\Delta _{n}=n_{0}\), using the following moment condition:
for any \(x\in \left[ x_{1},x_{2}\right] \) and some constant \( C>0\) that is independent of \(n_0\).
Firstly, note that \(\varsigma _{i,n}\left( x\right) ,1\le i\le n\), are independent variables with and
Thus, by (16), one has ,
Secondly, notice that
Since \(K\left( u\right) \) \(\in C^{\left( 1\right) }\left[ -1,1\right] \) by Assumption (A3),
and
for some constant \(C_{1}>0\) that is independent of \(n_0\). Therefore, by the Schwarz inequality, one has that
which together with (16) concludes that
for some \(C>0\) that is independent of \(n_0\), verifying the tightness. The proof is completed. \(\Box \)
By the definitions of \( V_{n}\left( x\right) \) in Theorem 1 and \(\zeta _{1n_{0}}(x)\) in (15), one has \(\zeta _{1n_{0}}\left( x\right) =\left( nh\right) ^{1/2}r_{n}^{-1/2}s^{-1/2}\left( x\right) f_{X}\left( x\right) V_{n}\left( x\right) \) given \(\Delta _{n}=n_{0}\). This together with Lemma 8, expression (18), and Slutsky’s Theorem concludes the following result.
Theorem 6
Under Assumptions (A1)–(A5), one has that, for any \(t\in \mathbb {R}\), as \( n_{0}\rightarrow \infty ,\)
A.3 Proofs of the theorems in Section 2
Proof of Theorem 1. By Lemma 3and Assumption (A5), one has
which implies that
It together with (11) and Lemmas 2 and 4 concludes that uniformly for all \(x\in \left[ a_{0},b_{0}\right] \),
The proof is completed. \(\Box \)
Proof of Theorem 2. According to Theorem 6, for any \(t\in \mathbb {R}\) , as \(n_{0}\rightarrow \infty \),
Thus one has that for any given \(\epsilon >0\) and \(t\in \mathbb {R}\), there exists \(N_{0}>0\) such that
for all \(n_{0}\ge N_{0}\). On the other hand, since \(\Delta _{n}/n\rightarrow P\left( \delta _{1}=1\right) >0\) a.s., there exists \( N>N_{0}\) such that when \(n\ge N\), \(P\left( \Delta _{n}\ge N_{0}\right) >1-\epsilon /2\). Therefore, unconditional on \(\Delta _{n}\), for \(n\ge N\),
This together with the fact that the dominating term of \( {\hat{m}}\left( x,\pi \right) -m\left( x\right) \) is \( V_{n}(x) \) as seen in Theorem 1 concludes Theorem 2. \(\Box \)
Proof of Theorem 3. By (11) and (12) one has
By Lemma 5, it is easily seen that
completing the proof. \(\Box \)
Proof of Theorem 5. By definition,
Firstly, we study the uniform convergence property of \(\frac{h}{n}\sum \limits _{i=1}^{n}\frac{\delta _{i}}{\hat{\pi }_{i}^{2}}K_{h}^{2}\left( X_{i}\!-\!x\right) \hat{\varepsilon }_{i}^{2}\). Notice that
and
which imply that
Secondly, denote . By applying the inequality in Lemma 1, the Borel-Cantelli Lemma, and the truncation and discretization method as in the proof of Lemma 2, one obtains that
as \(n\rightarrow \infty \). Meanwhile, similar to the proof of Lemma 3, one can easily show that
Combining (23), (24), and (25), one has
Meanwhile, by Lemmas 3 and 5, and \(h_f=O(n^{-1/5})\), one can easily obtain that
Thus,
which together with the fact that
implies
It is easily seen from (22), (6), and (26) that
completing the proof. \(\Box \)
About this article
Cite this article
Cai, L., Gu, L., Wang, Q. et al. Simultaneous confidence bands for nonparametric regression with missing covariate data. Ann Inst Stat Math 73, 1249–1279 (2021). https://doi.org/10.1007/s10463-021-00784-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-021-00784-5