×

Gaussian linear model selection in a dependent context. (English) Zbl 1471.62310

Summary: In this paper, we study the nonparametric linear model, when the error process is a dependent Gaussian process. We focus on the estimation of the mean vector via a model selection approach. We first give the general theoretical form of the penalty function, ensuring that the penalized estimator among a collection of models satisfies an oracle inequality. Then we derive a penalty shape involving the spectral radius of the covariance matrix of the errors, which can be chosen proportional to the dimension when the error process is stationary and short range dependent. However, this penalty can be too rough in some cases, in particular when the error process is long range dependent. In a second part, we focus on the fixed-design regression model assuming that the error process is a stationary Gaussian process. We propose a model selection procedure in order to estimate the mean function via piecewise polynomials on a regular partition, when the error process is either short range dependent, long range dependent or anti-persistent. We present different kinds of penalties, depending on the memory of the process. For each case, an adaptive estimator is built, and the rates of convergence are computed. Thanks to several sets of simulations, we study the performance of these different penalties for all types of errors (short memory, long memory and anti-persistent errors). Finally, we give an application of our method to the well-known Nile data, which clearly shows that the type of dependence of the error process must be taken into account.

MSC:

62G07 Density estimation
62G08 Nonparametric regression and quantile regression
62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)

Software:

CAPUSHE; longmemo

References:

[1] Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory (Tsahkadsor, 1971) 267-281. · Zbl 0283.62006
[2] Arlot, S. (2019). Minimal penalties and the slope heuristics: a survey. arXiv preprint arXiv:1901.07277. · Zbl 1437.62121
[3] Baraud, Y. (2000). Model selection for regression on a fixed design. Probability Theory and Related Fields 117 467-493. · Zbl 0997.62027
[4] Baraud, Y. (2002). Model selection for regression on a random design. ESAIM: Probability and Statistics 6 127-146. · Zbl 1059.62038
[5] Baraud, Y., Comte, F. and Viennet, G. (2001). Adaptive estimation in autoregression or-mixing regression via model selection. The Annals of Statistics 29 839-875. · Zbl 1012.62034
[6] Baudry, J.-P., Maugis, C. and Michel, B. (2012). Slope heuristics: overview and implementation. Statistics and Computing 22 455-470. · Zbl 1322.62007
[7] Beran, J. (1994). Statistics for long-memory processes. Monographs on Statistics and Applied Probability 61. Chapman and Hall, New York. · Zbl 0869.60045
[8] Beran, J. and Feng, Y. (2002). Local polynomial fitting with long-memory, short-memory and antipersistent errors. Ann. Inst. Statist. Math. 54 291-311. · Zbl 1012.62033
[9] Beran, J. and Shumeyko, Y. (2012). On asymptotically optimal wavelet estimation of trend functions under long-range dependence. Bernoulli 18 137-176. · Zbl 1235.62124
[10] Birgé, L. and Massart, P. (2001a). Gaussian model selection. J. Eur. Math. Soc. (JEMS) 3 203-268. · Zbl 1037.62001
[11] Birgé, L. and Massart, P. (2001b). A generalized Cp criterion for Gaussian model selection. Technical report, Universités de Paris 6 et Paris 7. · Zbl 1037.62001
[12] Birgé, L. and Massart, P. (2007). Minimal penalties for Gaussian model selection. Probability theory and related fields 138 33-73. · Zbl 1112.62082
[13] Cirel’son, B. S., Ibragimov, I. A. and Sudakov, V. N. (1976). Norms of Gaussian sample functions. In Proceedings of the Third Japan-USSR Symposium on Probability Theory (Tashkent, 1975) 20-41. Lecture Notes in Math., Vol. 550. · Zbl 0359.60019
[14] Csörgő, S. and Mielniczuk, J. (1995a). Close short-range dependent sums and regression estimation. Acta Sci. Math. (Szeged) 60 177-196. · Zbl 0852.62036
[15] Csörgő, S. and Mielniczuk, J. (1995b). Distant long-range dependent sums and regression estimation. Stochastic Process. Appl. 59 143-155. · Zbl 0836.60002
[16] Csörgő, S. and Mielniczuk, J. (1995c). Nonparametric regression under long-range dependent normal errors. Ann. Statist. 23 1000-1014. · Zbl 0852.62035
[17] Dedecker, J., Dehling, H. and Taqqu, M. S. (2015). Weak convergence of the empirical process of intermittent maps in \[{\mathbb{L}^2}\] under long-range dependence. Stoch. Dyn. 15 29p. · Zbl 1315.60038
[18] Dedecker, J., Gouëzel, S. and Merlevède, F. (2018). Large and moderate deviations for bounded functions of slowly mixing Markov chains. Stoch. Dyn. 18 1850017, 38. · Zbl 1386.60072
[19] Dehling, H. and Taqqu, M. S. (1989). The empirical process of some long-range dependent sequences with an application to \(U\)-statistics. Ann. Statist. 17 1767-1783. · Zbl 0696.60032 · doi:10.1214/aos/1176347394
[20] DeVore, R. A. and Lorentz, G. G. (1993). Constructive approximation 303. Springer Science & Business Media.
[21] Doukhan, P., Massart, P. and Rio, E. (1994). The functional central limit theorem for strongly mixing processes. Ann. Inst. H. Poincaré Probab. Statist. 30 63-82. · Zbl 0790.60037
[22] Eckart, C. and Young, G. (1936). The approximation of one matrix by another of lower rank. Psychometrika 1 211-218. · JFM 62.1075.02
[23] Gendre, X. (2008). Simultaneous estimation of the mean and the variance in heteroscedastic Gaussian regression. Electronic Journal of Statistics 2 1345-1372. · Zbl 1320.62092
[24] Gendre, X. (2014). Model selection and estimation of a component in additive regression. ESAIM: Probability and Statistics 18 77-116. · Zbl 1307.62112
[25] Gerschgorin, S. (1931). Über die Abgrenzung der Eigenwerte einer Matrix. Izv. Akad. Nauk. USSR Otd. Fiz.-Mat. Nauk 7 749-754. · JFM 57.1340.06
[26] Giraitis, L., Koul, H. L. and Surgailis, D. (2012). Large sample inference for long memory processes. Imperial College Press, London. · Zbl 1279.62016
[27] Giraitis, L., Robinson, P. M. and Surgailis, D. (2000). A model for long memory conditional heteroscedasticity. Ann. Appl. Probab. 10 1002-1024. · Zbl 1084.62516
[28] Giraud, C. (2014). Introduction to high-dimensional statistics. Chapman and Hall/CRC.
[29] Hall, P. and Hart, J. D. (1990). Nonparametric regression with long-range dependence. Stochastic Process. Appl. 36 339-351. · Zbl 0713.62048
[30] Hall, P., Kerkyacharian, G. and Picard, D. (1999). On the minimax optimality of block thresholded wavelet estimators. Statist. Sinica 9 33-49. · Zbl 0915.62028
[31] Hurst, H. E. (1951). Long-term storage capacity of reservoirs. Trans. Amer. Soc. Civil Eng. 116 770-799.
[32] Johnstone, I. M. (1999). Wavelet shrinkage for correlated data and inverse problems: adaptivity results. Statist. Sinica 9 51-83. · Zbl 1065.62519
[33] Johnstone, I. M. and Silverman, B. W. (1997). Wavelet threshold estimators for data with correlated noise. J. Roy. Statist. Soc. Ser. B 59 319-351. · Zbl 0886.62044 · doi:10.1111/1467-9868.00071
[34] Lerasle, M. (2011). Optimal model selection for density estimation of stationary data under various mixing conditions. The Annals of Statistics 39 1852-1877. · Zbl 1227.62018
[35] Li, L. and Xiao, Y. (2007). On the minimax optimality of block thresholded wavelet estimators with long memory data. J. Statist. Plann. Inference 137 2850-2869. · Zbl 1331.62220
[36] Mallows, C. L. (1973). Some comments on Cp. Technometrics 15 661-675. · Zbl 0269.62061
[37] Mandelbrot, B. B. and Van Ness, J. W. (1968). Fractional Brownian motions, fractional noises and applications. SIAM Rev. 10 422-437. · Zbl 0179.47801 · doi:10.1137/1010093
[38] Massart, P. (2007). Concentration inequalities and model selection. Lecture Notes in Mathematics 1896. Springer, Berlin. · Zbl 1170.60006
[39] Pesee, C. (2008). Long-range dependence of financial time series data. WASET 44 163-167.
[40] Pipiras, V. and Taqqu, M. S. (2017). Long-range dependence and self-similarity. Cambridge Series in Statistical and Probabilistic Mathematics, [45]. Cambridge University Press, Cambridge. · Zbl 1377.60005
[41] Puplinskaitė, D. and Surgailis, D. (2010). Aggregation of a random-coefficient \[\text{AR}(1)\] process with infinite variance and idiosyncratic innovations. Adv. in Appl. Probab. 42 509-527. · Zbl 1191.62154
[42] Robinson, P. M. (1991). Testing for strong serial correlation and dynamic conditional heteroskedasticity in multiple regression. J. Econometrics 47 67-84. · Zbl 0734.62070
[43] Robinson, P. M. (1997). Large-sample inference for nonparametric regression with dependent errors. Ann. Statist. 25 2054-2083. · Zbl 0882.62039
[44] Samorodnitsky, G. and Taqqu, M. S. (1994). Stable non-Gaussian random processes. Stochastic Modeling. Chapman & Hall, New York Stochastic models with infinite variance. · Zbl 0925.60027
[45] Stephenson, D. B., Pavan, V. and Bojariu, R. (2000). Is the North Atlantic Oscillation a random walk? International Journal of Climatology 20 1-18.
[46] Toussoun, O. (1925). Mémoire sur L’Histoire du Nil. 3 vols. Cairo, L’Institut Français D’Archéologie Orientale.
[47] Tran, L., Roussas, G., Yakowitz, S. and Truong Van, B. (1996). Fixed-design regression for linear time series. Ann. Statist. 24 975-991. · Zbl 0862.62069
[48] Wang, Y. (1996). Function estimation via wavelet shrinkage for long-memory data. The Annals of Statistics 24 466-484. · Zbl 0859.62042
[49] Whittle, P. (1953). Estimation and information in stationary time series. Arkiv för matematik 2 423-434. · Zbl 0053.41003
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.