×

Doubly penalized estimation in additive regression with high-dimensional data. (English) Zbl 1436.62358

The authors consider high-dimensional nonparametric additive regression: given \((X_1,Y_1),\ldots,(X_n,Y_n)\), independent observations, where each \(Y_i\in\mathbb{R}\) is a response variable and each \(X_i\in\mathbb{R}^d\) is a vector of covariates, consider the model \(Y_i=g^*(X_i)+\varepsilon_i\), where \[ g^*(x)=\sum_{j=1}^pg_j^*\left(x^{(j)}\right)\,, \] each \(\varepsilon_i\) is a noise term, and, for each \(j\), \(x^{(j)}\) is a vector formed of a (small) subset of the components of \(x\in\mathbb{R}^d\), which may possibly be overlapping, with \(p\) possibly larger than \(n\). The class of estimators of \(g^*\) studied have two penalty components: one using an empirical \(L_2\) norm to induce sparsity of the estimator, and another using functional semi-norms to induce smoothness.
The main results of the paper are oracle inequalities for predictive performance in this setting, giving upper bounds on the penalized predictive loss for both fixed and random designs. In the fixed design setting, new observations are drawn with covariates from the sample \((X_1,\ldots,X_n)\), whereas the random design setting has covariates drawn from the distributions of \((X_1,\ldots,X_n)\). These oracle inequalities are established under assumptions of sub-Gaussian tails for the noise, an entropy condition on the relevant functional classes, and an empirical compatibility condition. In the setting of random designs, this sample compatibility condition may be replaced by a population compatibility condition and a condition ensuring convergence of empirical norms. Compared to existing results in the literature, these conditions are weaker and the resulting inequalities give better rates of convergence. The framework is flexible, in that it allows for a decoupling of sparsity and smoothness conditions.
The authors consider the special cases of Sobolev and bounded variation spaces (where explicit rates of convergence obtained in the oracle inequalities are shown to match minimax lower bounds), and also give results on convergence of empirical norms that may be of independent interest.

MSC:

62J12 Generalized linear models (logistic models)
62E20 Asymptotic distribution theory in statistics
62F25 Parametric tolerance and confidence regions
62F35 Robustness and adaptive procedures (parametric inference)
62J05 Linear regression; mixed models
62J10 Analysis of variance and covariance (ANOVA)
46E22 Hilbert spaces with reproducing kernels (= (proper) functional Hilbert spaces, including de Branges-Rovnyak and other structured spaces)
62G08 Nonparametric regression and quantile regression

Software:

gss

References:

[1] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist.37 1705-1732. · Zbl 1173.62022 · doi:10.1214/08-AOS620
[2] Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Stat.1 169-194. · Zbl 1146.62028 · doi:10.1214/07-EJS008
[3] Dalalyan, A., Ingster, Y. and Tsybakov, A. B. (2014). Statistical inference in compound functional models. Probab. Theory Related Fields158 513-532. · Zbl 1285.62041 · doi:10.1007/s00440-013-0487-y
[4] DeVore, R. A. and Lorentz, G. G. (1993). Constructive Approximation. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] 303. Springer, Berlin. · Zbl 0797.41016
[5] Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli10 971-988. · Zbl 1055.62078 · doi:10.3150/bj/1106314846
[6] Gu, C. (2002). Smoothing Spline ANOVA Models. Springer Series in Statistics. Springer, New York. · Zbl 1051.62034
[7] Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models. Monographs on Statistics and Applied Probability43. CRC Press, London. · Zbl 0747.62061
[8] Huang, J., Horowitz, J. L. and Wei, F. (2010). Variable selection in nonparametric additive models. Ann. Statist.38 2282-2313. · Zbl 1202.62051 · doi:10.1214/09-AOS781
[9] Kim, S.-J., Koh, K., Boyd, S. and Gorinevsky, D. (2009). \(l_1\) trend filtering. SIAM Rev.51 339-360. · Zbl 1171.37033 · doi:10.1137/070690274
[10] Koltchinskii, V., Lounici, K. and Tsybakov, A. B. (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Statist.39 2302-2329. · Zbl 1231.62097 · doi:10.1214/11-AOS894
[11] Koltchinskii, V. and Yuan, M. (2010). Sparsity in multiple kernel learning. Ann. Statist.38 3660-3695. · Zbl 1204.62086 · doi:10.1214/10-AOS825
[12] Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces: Isoperimetry and Processes. Ergebnisse der Mathematik und Ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)] 23. Springer, Berlin. · Zbl 0748.60004
[13] Lin, Y. and Zhang, H. H. (2006). Component selection and smoothing in multivariate nonparametric regression. Ann. Statist.34 2272-2297. · Zbl 1106.62041 · doi:10.1214/009053606000000722
[14] Lorentz, G. G., Golitschek, M. V. and Makovoz, Y. (1996). Constructive Approximation: Advanced Problems. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] 304. Springer, Berlin. · Zbl 0910.41001
[15] Mammen, E. (1991). Nonparametric regression under qualitative smoothness assumptions. Ann. Statist.19 741-759. · Zbl 0737.62039 · doi:10.1214/aos/1176348118
[16] Mammen, E. and van de Geer, S. (1997). Locally adaptive regression splines. Ann. Statist.25 387-413. · Zbl 0871.62040 · doi:10.1214/aos/1034276635
[17] Meier, L., van de Geer, S. and Bühlmann, P. (2009). High-dimensional additive modeling. Ann. Statist.37 3779-3821. · Zbl 1360.62186 · doi:10.1214/09-AOS692
[18] Müller, P. and van de Geer, S. (2015). The partial linear model in high dimensions. Scand. J. Stat.42 580-608. · Zbl 1364.62196
[19] Nirenberg, L. (1966). An extended interpolation inequality. Ann. Sc. Norm. Super. Pisa Cl. Sci. (3) 20 733-737. · Zbl 0163.29905
[20] Petersen, A., Witten, D. and Simon, N. (2016). Fused lasso additive model. J. Comput. Graph. Statist.25 1005-1025.
[21] Raskutti, G., Wainwright, M. J. and Yu, B. (2012). Minimax-optimal rates for sparse additive models over kernel classes via convex programming. J. Mach. Learn. Res.13 389-427. · Zbl 1283.62071
[22] Ravikumar, P., Lafferty, J., Liu, H. and Wasserman, L. (2009). Sparse additive models. J. R. Stat. Soc. Ser. B. Stat. Methodol.71 1009-1030. · Zbl 1411.62107 · doi:10.1111/j.1467-9868.2009.00718.x
[23] Sadhanala, V. and Tibshirani, R. J. (2017). Additive models with trend filtering. Preprint. Available at arXiv:1702.05037. · Zbl 1436.62450
[24] Stone, C. J. (1982). Optimal global rates of convergence for nonparametric regression. Ann. Statist.10 1040-1053. · Zbl 0511.62048 · doi:10.1214/aos/1176345969
[25] Stone, C. J. (1985). Additive regression and other nonparametric models. Ann. Statist.13 689-705. · Zbl 0605.62065 · doi:10.1214/aos/1176349548
[26] Suzuki, T. and Sugiyama, M. (2013). Fast learning rate of multiple kernel learning: Trade-off between sparsity and smoothness. Ann. Statist.41 1381-1405. · Zbl 1273.62090 · doi:10.1214/13-AOS1095
[27] Tan, Z. and Zhang, C.-H. (2019). Supplement to “Doubly penalized estimation in additive regression with high-dimensional data.” DOI:10.1214/18-AOS1757SUPP. · Zbl 1436.62358
[28] Tibshirani, R. J. (2014). Adaptive piecewise polynomial estimation via trend filtering. Ann. Statist.42 285-323. · Zbl 1307.62118 · doi:10.1214/13-AOS1189
[29] van de Geer, S. (2000). Empirical Processes in M-Estimation. Cambridge Univ. Press, Cambridge. · Zbl 1179.62073
[30] van de Geer, S. A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat.3 1360-1392. · Zbl 1327.62425 · doi:10.1214/09-EJS506
[31] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Series in Statistics. Springer, New York. · Zbl 0862.60002
[32] Yang, T. and Tan, Z. (2018). Backfitting algorithms for total-variation and empirical-norm penalized additive modelling with high-dimensional data. Stat7 e198. · Zbl 07851084
[33] Yang, Y. and Tokdar, S. T. (2015). Minimax-optimal nonparametric regression in high dimensions. Ann. Statist.43 652-674. · Zbl 1312.62052 · doi:10.1214/14-AOS1289
[34] Yuan, M. and Zhou, D.-X. (2016). Minimax optimal rates of estimation in high dimensional additive models. Ann. Statist.44 2564-2593. · Zbl 1360.62200 · doi:10.1214/15-AOS1422
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.