-
Tuning parameter selection in econometrics
Authors:
Denis Chetverikov
Abstract:
I review some of the main methods for selecting tuning parameters in nonparametric and $\ell_1$-penalized estimation. For the nonparametric estimation, I consider the methods of Mallows, Stein, Lepski, cross-validation, penalization, and aggregation in the context of series estimation. For the $\ell_1$-penalized estimation, I consider the methods based on the theory of self-normalized moderate dev…
▽ More
I review some of the main methods for selecting tuning parameters in nonparametric and $\ell_1$-penalized estimation. For the nonparametric estimation, I consider the methods of Mallows, Stein, Lepski, cross-validation, penalization, and aggregation in the context of series estimation. For the $\ell_1$-penalized estimation, I consider the methods based on the theory of self-normalized moderate deviations, bootstrap, Stein's unbiased risk estimation, and cross-validation in the context of Lasso estimation. I explain the intuition behind each of the methods and discuss their comparative advantages. I also give some extensions.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
One Factor to Bind the Cross-Section of Returns
Authors:
Nicola Borri,
Denis Chetverikov,
Yukun Liu,
Aleh Tsyvinski
Abstract:
We propose a new non-linear single-factor asset pricing model $r_{it}=h(f_{t}λ_{i})+ε_{it}$. Despite its parsimony, this model represents exactly any non-linear model with an arbitrary number of factors and loadings -- a consequence of the Kolmogorov-Arnold representation theorem. It features only one pricing component $h(f_{t}λ_{I})$, comprising a nonparametric link function of the time-dependent…
▽ More
We propose a new non-linear single-factor asset pricing model $r_{it}=h(f_{t}λ_{i})+ε_{it}$. Despite its parsimony, this model represents exactly any non-linear model with an arbitrary number of factors and loadings -- a consequence of the Kolmogorov-Arnold representation theorem. It features only one pricing component $h(f_{t}λ_{I})$, comprising a nonparametric link function of the time-dependent factor and factor loading that we jointly estimate with sieve-based estimators. Using 171 assets across major classes, our model delivers superior cross-sectional performance with a low-dimensional approximation of the link function. Most known finance and macro factors become insignificant controlling for our single-factor.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
csranks: An R Package for Estimation and Inference Involving Ranks
Authors:
Denis Chetverikov,
Magne Mogstad,
Pawel Morgen,
Joseph Romano,
Azeem Shaikh,
Daniel Wilhelm
Abstract:
This article introduces the R package csranks for estimation and inference involving ranks. First, we review methods for the construction of confidence sets for ranks, namely marginal and simultaneous confidence sets as well as confidence sets for the identities of the tau-best. Second, we review methods for estimation and inference in regressions involving ranks. Third, we describe the implementa…
▽ More
This article introduces the R package csranks for estimation and inference involving ranks. First, we review methods for the construction of confidence sets for ranks, namely marginal and simultaneous confidence sets as well as confidence sets for the identities of the tau-best. Second, we review methods for estimation and inference in regressions involving ranks. Third, we describe the implementation of these methods in csranks and illustrate their usefulness in two examples: one about the quantification of uncertainty in the PISA ranking of countries and one about the measurement of intergenerational mobility using rank-rank regressions.
△ Less
Submitted 26 January, 2024;
originally announced January 2024.
-
Logit-based alternatives to two-stage least squares
Authors:
Denis Chetverikov,
Jinyong Hahn,
Zhipeng Liao,
Shuyang Sheng
Abstract:
We propose logit-based IV and augmented logit-based IV estimators that serve as alternatives to the traditionally used 2SLS estimator in the model where both the endogenous treatment variable and the corresponding instrument are binary. Our novel estimators are as easy to compute as the 2SLS estimator but have an advantage over the 2SLS estimator in terms of causal interpretability. In particular,…
▽ More
We propose logit-based IV and augmented logit-based IV estimators that serve as alternatives to the traditionally used 2SLS estimator in the model where both the endogenous treatment variable and the corresponding instrument are binary. Our novel estimators are as easy to compute as the 2SLS estimator but have an advantage over the 2SLS estimator in terms of causal interpretability. In particular, in certain cases where the probability limits of both our estimators and the 2SLS estimator take the form of weighted-average treatment effects, our estimators are guaranteed to yield non-negative weights whereas the 2SLS estimator is not.
△ Less
Submitted 16 December, 2023;
originally announced December 2023.
-
Inference for Rank-Rank Regressions
Authors:
Denis Chetverikov,
Daniel Wilhelm
Abstract:
The slope coefficient in a rank-rank regression is a popular measure of intergenerational mobility. In this article, we first show that commonly used inference methods for this slope parameter are invalid. Second, when the underlying distribution is not continuous, the OLS estimator and its asymptotic distribution may be highly sensitive to how ties in the ranks are handled. Motivated by these fin…
▽ More
The slope coefficient in a rank-rank regression is a popular measure of intergenerational mobility. In this article, we first show that commonly used inference methods for this slope parameter are invalid. Second, when the underlying distribution is not continuous, the OLS estimator and its asymptotic distribution may be highly sensitive to how ties in the ranks are handled. Motivated by these findings we develop a new asymptotic theory for the OLS estimator in a general class of rank-rank regression specifications without imposing any assumptions about the continuity of the underlying distribution. We then extend the asymptotic theory to other regressions involving ranks that have been used in empirical work. Finally, we apply our new inference methods to two empirical studies on intergenerational mobility, highlighting the practical implications of our theoretical findings.
△ Less
Submitted 2 July, 2024; v1 submitted 24 October, 2023;
originally announced October 2023.
-
Standard errors when a regressor is randomly assigned
Authors:
Denis Chetverikov,
Jinyong Hahn,
Zhipeng Liao,
Andres Santos
Abstract:
We examine asymptotic properties of the OLS estimator when the values of the regressor of interest are assigned randomly and independently of other regressors. We find that the OLS variance formula in this case is often simplified, sometimes substantially. In particular, when the regressor of interest is independent not only of other regressors but also of the error term, the textbook homoskedasti…
▽ More
We examine asymptotic properties of the OLS estimator when the values of the regressor of interest are assigned randomly and independently of other regressors. We find that the OLS variance formula in this case is often simplified, sometimes substantially. In particular, when the regressor of interest is independent not only of other regressors but also of the error term, the textbook homoskedastic variance formula is valid even if the error term and auxiliary regressors exhibit a general dependence structure. In the context of randomized controlled trials, this conclusion holds in completely randomized experiments with constant treatment effects. When the error term is heteroscedastic with respect to the regressor of interest, the variance formula has to be adjusted not only for heteroscedasticity but also for correlation structure of the error term. However, even in the latter case, some simplifications are possible as only a part of the correlation structure of the error term should be taken into account. In the context of randomized control trials, this implies that the textbook homoscedastic variance formula is typically not valid if treatment effects are heterogenous but heteroscedasticity-robust variance formulas are valid if treatment effects are independent across units, even if the error term exhibits a general dependence structure. In addition, we extend the results to the case when the regressor of interest is assigned randomly at a group level, such as in randomized control trials with treatment assignment determined at a group (e.g., school/village) level.
△ Less
Submitted 17 March, 2023;
originally announced March 2023.
-
Spectral and post-spectral estimators for grouped panel data models
Authors:
Denis Chetverikov,
Elena Manresa
Abstract:
In this paper, we develop spectral and post-spectral estimators for grouped panel data models. Both estimators are consistent in the asymptotics where the number of observations $N$ and the number of time periods $T$ simultaneously grow large. In addition, the post-spectral estimator is $\sqrt{NT}$-consistent and asymptotically normal with mean zero under the assumption of well-separated groups ev…
▽ More
In this paper, we develop spectral and post-spectral estimators for grouped panel data models. Both estimators are consistent in the asymptotics where the number of observations $N$ and the number of time periods $T$ simultaneously grow large. In addition, the post-spectral estimator is $\sqrt{NT}$-consistent and asymptotically normal with mean zero under the assumption of well-separated groups even if $T$ is growing much slower than $N$. The post-spectral estimator has, therefore, theoretical properties that are comparable to those of the grouped fixed-effect estimator developed by Bonhomme and Manresa (2015). In contrast to the grouped fixed-effect estimator, however, our post-spectral estimator is computationally straightforward.
△ Less
Submitted 29 December, 2022; v1 submitted 26 December, 2022;
originally announced December 2022.
-
High-dimensional Data Bootstrap
Authors:
Victor Chernozhukov,
Denis Chetverikov,
Kengo Kato,
Yuta Koike
Abstract:
This article reviews recent progress in high-dimensional bootstrap. We first review high-dimensional central limit theorems for distributions of sample mean vectors over the rectangles, bootstrap consistency results in high dimensions, and key techniques used to establish those results. We then review selected applications of high-dimensional bootstrap: construction of simultaneous confidence sets…
▽ More
This article reviews recent progress in high-dimensional bootstrap. We first review high-dimensional central limit theorems for distributions of sample mean vectors over the rectangles, bootstrap consistency results in high dimensions, and key techniques used to establish those results. We then review selected applications of high-dimensional bootstrap: construction of simultaneous confidence sets for high-dimensional vector parameters, multiple hypothesis testing via stepdown, post-selection inference, intersection bounds for partially identified parameters, and inference on best policies in policy evaluation. Finally, we also comment on a couple of future research directions.
△ Less
Submitted 19 May, 2022;
originally announced May 2022.
-
Weighted-average quantile regression
Authors:
Denis Chetverikov,
Yukun Liu,
Aleh Tsyvinski
Abstract:
In this paper, we introduce the weighted-average quantile regression framework, $\int_0^1 q_{Y|X}(u)ψ(u)du = X'β$, where $Y$ is a dependent variable, $X$ is a vector of covariates, $q_{Y|X}$ is the quantile function of the conditional distribution of $Y$ given $X$, $ψ$ is a weighting function, and $β$ is a vector of parameters. We argue that this framework is of interest in many applied settings a…
▽ More
In this paper, we introduce the weighted-average quantile regression framework, $\int_0^1 q_{Y|X}(u)ψ(u)du = X'β$, where $Y$ is a dependent variable, $X$ is a vector of covariates, $q_{Y|X}$ is the quantile function of the conditional distribution of $Y$ given $X$, $ψ$ is a weighting function, and $β$ is a vector of parameters. We argue that this framework is of interest in many applied settings and develop an estimator of the vector of parameters $β$. We show that our estimator is $\sqrt T$-consistent and asymptotically normal with mean zero and easily estimable covariance matrix, where $T$ is the size of available sample. We demonstrate the usefulness of our estimator by applying it in two empirical settings. In the first setting, we focus on financial data and study the factor structures of the expected shortfalls of the industry portfolios. In the second setting, we focus on wage data and study inequality and social welfare dependence on commonly used individual characteristics.
△ Less
Submitted 6 March, 2022;
originally announced March 2022.
-
Selecting Penalty Parameters of High-Dimensional M-Estimators using Bootstrapping after Cross-Validation
Authors:
Denis Chetverikov,
Jesper Riis-Vestergaard Sørensen
Abstract:
We develop a new method for selecting the penalty parameter for $\ell_1$-penalized M-estimators in high dimensions, which we refer to as bootstrapping after cross-validation. We derive rates of convergence for the corresponding $\ell_1$-penalized M-estimator and also for the post-$\ell_1$-penalized M-estimator, which refits the non-zero parameters of the former estimator without penalty in the cri…
▽ More
We develop a new method for selecting the penalty parameter for $\ell_1$-penalized M-estimators in high dimensions, which we refer to as bootstrapping after cross-validation. We derive rates of convergence for the corresponding $\ell_1$-penalized M-estimator and also for the post-$\ell_1$-penalized M-estimator, which refits the non-zero parameters of the former estimator without penalty in the criterion function. We demonstrate via simulations that our method is not dominated by cross-validation in terms of estimation errors and outperforms cross-validation in terms of inference. As an illustration, we revisit Fryer Jr (2019), who investigated racial differences in police use of force, and confirm his findings.
△ Less
Submitted 14 August, 2023; v1 submitted 10 April, 2021;
originally announced April 2021.
-
Nearly optimal central limit theorem and bootstrap approximations in high dimensions
Authors:
Victor Chernozhukov,
Denis Chetverikov,
Yuta Koike
Abstract:
In this paper, we derive new, nearly optimal bounds for the Gaussian approximation to scaled averages of $n$ independent high-dimensional centered random vectors $X_1,\dots,X_n$ over the class of rectangles in the case when the covariance matrix of the scaled average is non-degenerate. In the case of bounded $X_i$'s, the implied bound for the Kolmogorov distance between the distribution of the sca…
▽ More
In this paper, we derive new, nearly optimal bounds for the Gaussian approximation to scaled averages of $n$ independent high-dimensional centered random vectors $X_1,\dots,X_n$ over the class of rectangles in the case when the covariance matrix of the scaled average is non-degenerate. In the case of bounded $X_i$'s, the implied bound for the Kolmogorov distance between the distribution of the scaled average and the Gaussian vector takes the form $$C (B^2_n \log^3 d/n)^{1/2} \log n,$$ where $d$ is the dimension of the vectors and $B_n$ is a uniform envelope constant on components of $X_i$'s. This bound is sharp in terms of $d$ and $B_n$, and is nearly (up to $\log n$) sharp in terms of the sample size $n$. In addition, we show that similar bounds hold for the multiplier and empirical bootstrap approximations. Moreover, we establish bounds that allow for unbounded $X_i$'s, formulated solely in terms of moments of $X_i$'s. Finally, we demonstrate that the bounds can be further improved in some special smooth and zero-skewness cases.
△ Less
Submitted 12 May, 2021; v1 submitted 17 December, 2020;
originally announced December 2020.
-
Improved Central Limit Theorem and bootstrap approximations in high dimensions
Authors:
Victor Chernozhukov,
Denis Chetverikov,
Kengo Kato,
Yuta Koike
Abstract:
This paper deals with the Gaussian and bootstrap approximations to the distribution of the max statistic in high dimensions. This statistic takes the form of the maximum over components of the sum of independent random vectors and its distribution plays a key role in many high-dimensional econometric problems. Using a novel iterative randomized Lindeberg method, the paper derives new bounds for th…
▽ More
This paper deals with the Gaussian and bootstrap approximations to the distribution of the max statistic in high dimensions. This statistic takes the form of the maximum over components of the sum of independent random vectors and its distribution plays a key role in many high-dimensional econometric problems. Using a novel iterative randomized Lindeberg method, the paper derives new bounds for the distributional approximation errors. These new bounds substantially improve upon existing ones and simultaneously allow for a larger class of bootstrap methods.
△ Less
Submitted 29 May, 2022; v1 submitted 22 December, 2019;
originally announced December 2019.
-
Deep Face Image Retrieval: a Comparative Study with Dictionary Learning
Authors:
Ahmad S. Tarawneh,
Ahmad B. A. Hassanat,
Ceyhun Celik,
Dmitry Chetverikov,
M. Sohel Rahman,
Chaman Verma
Abstract:
Facial image retrieval is a challenging task since faces have many similar features (areas), which makes it difficult for the retrieval systems to distinguish faces of different people. With the advent of deep learning, deep networks are often applied to extract powerful features that are used in many areas of computer vision. This paper investigates the application of different deep learning mode…
▽ More
Facial image retrieval is a challenging task since faces have many similar features (areas), which makes it difficult for the retrieval systems to distinguish faces of different people. With the advent of deep learning, deep networks are often applied to extract powerful features that are used in many areas of computer vision. This paper investigates the application of different deep learning models for face image retrieval, namely, Alexlayer6, Alexlayer7, VGG16layer6, VGG16layer7, VGG19layer6, and VGG19layer7, with two types of dictionary learning techniques, namely $K$-means and $K$-SVD. We also investigate some coefficient learning techniques such as the Homotopy, Lasso, Elastic Net and SSF and their effect on the face retrieval system. The comparative results of the experiments conducted on three standard face image datasets show that the best performers for face image retrieval are Alexlayer7 with $K$-means and SSF, Alexlayer6 with $K$-SVD and SSF, and Alexlayer6 with $K$-means and SSF. The APR and ARR of these methods were further compared to some of the state of the art methods based on local descriptors. The experimental results show that deep learning outperforms most of those methods and therefore can be recommended for use in practice of face image retrieval
△ Less
Submitted 13 December, 2018;
originally announced December 2018.
-
Detailed Investigation of Deep Features with Sparse Representation and Dimensionality Reduction in CBIR: A Comparative Study
Authors:
Ahmad S. Tarawneh,
Ceyhun Celik,
Ahmad B. Hassanat,
Dmitry Chetverikov
Abstract:
Research on content-based image retrieval (CBIR) has been under development for decades, and numerous methods have been competing to extract the most discriminative features for improved representation of the image content. Recently, deep learning methods have gained attention in computer vision, including CBIR. In this paper, we present a comparative investigation of different features, including…
▽ More
Research on content-based image retrieval (CBIR) has been under development for decades, and numerous methods have been competing to extract the most discriminative features for improved representation of the image content. Recently, deep learning methods have gained attention in computer vision, including CBIR. In this paper, we present a comparative investigation of different features, including low-level and high-level features, for CBIR. We compare the performance of CBIR systems using different deep features with state-of-the-art low-level features such as SIFT, SURF, HOG, LBP, and LTP, using different dictionaries and coefficient learning techniques. Furthermore, we conduct comparisons with a set of primitive and popular features that have been used in this field, including colour histograms and Gabor features. We also investigate the discriminative power of deep features using certain similarity measures under different validation approaches. Furthermore, we investigate the effects of the dimensionality reduction of deep features on the performance of CBIR systems using principal component analysis, discrete wavelet transform, and discrete cosine transform. Unprecedentedly, the experimental results demonstrate high (95\% and 93\%) mean average precisions when using the VGG-16 FC7 deep features of Corel-1000 and Coil-20 datasets with 10-D and 20-D K-SVD, respectively.
△ Less
Submitted 23 November, 2018;
originally announced November 2018.
-
High-Dimensional Econometrics and Regularized GMM
Authors:
Alexandre Belloni,
Victor Chernozhukov,
Denis Chetverikov,
Christian Hansen,
Kengo Kato
Abstract:
This chapter presents key concepts and theoretical results for analyzing estimation and inference in high-dimensional models. High-dimensional models are characterized by having a number of unknown parameters that is not vanishingly small relative to the sample size. We first present results in a framework where estimators of parameters of interest may be represented directly as approximate means.…
▽ More
This chapter presents key concepts and theoretical results for analyzing estimation and inference in high-dimensional models. High-dimensional models are characterized by having a number of unknown parameters that is not vanishingly small relative to the sample size. We first present results in a framework where estimators of parameters of interest may be represented directly as approximate means. Within this context, we review fundamental results including high-dimensional central limit theorems, bootstrap approximation of high-dimensional limit distributions, and moderate deviation theory. We also review key concepts underlying inference when many parameters are of interest such as multiple testing with family-wise error rate or false discovery rate control. We then turn to a general high-dimensional minimum distance framework with a special focus on generalized method of moments problems where we present results for estimation and inference about model parameters. The presented results cover a wide array of econometric applications, and we discuss several leading special cases including high-dimensional linear regression and linear instrumental variables models to illustrate the general results.
△ Less
Submitted 10 June, 2018; v1 submitted 5 June, 2018;
originally announced June 2018.
-
Pilot Comparative Study of Different Deep Features for Palmprint Identification in Low-Quality Images
Authors:
A. S. Tarawneh,
D. Chetverikov,
A. B. Hassanat
Abstract:
Deep Convolutional Neural Networks (CNNs) are widespread, efficient tools of visual recognition. In this paper, we present a comparative study of three popular pre-trained CNN models: AlexNet, VGG-16 and VGG-19. We address the problem of palmprint identification in low-quality imagery and apply Support Vector Machines (SVMs) with all of the compared models. For the comparison, we use the MOHI palm…
▽ More
Deep Convolutional Neural Networks (CNNs) are widespread, efficient tools of visual recognition. In this paper, we present a comparative study of three popular pre-trained CNN models: AlexNet, VGG-16 and VGG-19. We address the problem of palmprint identification in low-quality imagery and apply Support Vector Machines (SVMs) with all of the compared models. For the comparison, we use the MOHI palmprint image database whose images are characterized by low contrast, shadows, and varying illumination, scale, translation and rotation. Another, high-quality database called COEP is also considered to study the recognition gap between high-quality and low-quality imagery. Our experiments show that the deeper pre-trained CNN models, e.g., VGG-16 and VGG-19, tend to extract highly distinguishable features that recognize low-quality palmprints more efficiently than the less deep networks such as AlexNet. Furthermore, our experiments on the two databases using various models demonstrate that the features extracted from lower-level fully connected layers provide higher recognition rates than higher-layer features. Our results indicate that different pre-trained models can be efficiently used in touchless identification systems with low-quality palmprint images.
△ Less
Submitted 9 April, 2018;
originally announced April 2018.
-
Detailed proof of Nazarov's inequality
Authors:
Victor Chernozhukov,
Denis Chetverikov,
Kengo Kato
Abstract:
The purpose of this note is to provide a detailed proof of Nazarov's inequality stated in Lemma A.1 in Chernozhukov, Chetverikov, and Kato (2017, Annals of Probability).
The purpose of this note is to provide a detailed proof of Nazarov's inequality stated in Lemma A.1 in Chernozhukov, Chetverikov, and Kato (2017, Annals of Probability).
△ Less
Submitted 29 November, 2017;
originally announced November 2017.
-
Double/Debiased/Neyman Machine Learning of Treatment Effects
Authors:
Victor Chernozhukov,
Denis Chetverikov,
Mert Demirer,
Esther Duflo,
Christian Hansen,
Whitney Newey
Abstract:
Chernozhukov, Chetverikov, Demirer, Duflo, Hansen, and Newey (2016) provide a generic double/de-biased machine learning (DML) approach for obtaining valid inferential statements about focal parameters, using Neyman-orthogonal scores and cross-fitting, in settings where nuisance parameters are estimated using a new generation of nonparametric fitting methods for high-dimensional data, called machin…
▽ More
Chernozhukov, Chetverikov, Demirer, Duflo, Hansen, and Newey (2016) provide a generic double/de-biased machine learning (DML) approach for obtaining valid inferential statements about focal parameters, using Neyman-orthogonal scores and cross-fitting, in settings where nuisance parameters are estimated using a new generation of nonparametric fitting methods for high-dimensional data, called machine learning methods. In this note, we illustrate the application of this method in the context of estimating average treatment effects (ATE) and average treatment effects on the treated (ATTE) using observational data. A more general discussion and references to the existing literature are available in Chernozhukov, Chetverikov, Demirer, Duflo, Hansen, and Newey (2016).
△ Less
Submitted 30 January, 2017;
originally announced January 2017.
-
Double/Debiased Machine Learning for Treatment and Causal Parameters
Authors:
Victor Chernozhukov,
Denis Chetverikov,
Mert Demirer,
Esther Duflo,
Christian Hansen,
Whitney Newey,
James Robins
Abstract:
Most modern supervised statistical/machine learning (ML) methods are explicitly designed to solve prediction problems very well. Achieving this goal does not imply that these methods automatically deliver good estimators of causal parameters. Examples of such parameters include individual regression coefficients, average treatment effects, average lifts, and demand or supply elasticities. In fact,…
▽ More
Most modern supervised statistical/machine learning (ML) methods are explicitly designed to solve prediction problems very well. Achieving this goal does not imply that these methods automatically deliver good estimators of causal parameters. Examples of such parameters include individual regression coefficients, average treatment effects, average lifts, and demand or supply elasticities. In fact, estimates of such causal parameters obtained via naively plugging ML estimators into estimating equations for such parameters can behave very poorly due to the regularization bias. Fortunately, this regularization bias can be removed by solving auxiliary prediction problems via ML tools. Specifically, we can form an orthogonal score for the target low-dimensional parameter by combining auxiliary and main ML predictions. The score is then used to build a de-biased estimator of the target parameter which typically will converge at the fastest possible 1/root(n) rate and be approximately unbiased and normal, and from which valid confidence intervals for these parameters of interest may be constructed. The resulting method thus could be called a "double ML" method because it relies on estimating primary and auxiliary predictive models. In order to avoid overfitting, our construction also makes use of the K-fold sample splitting, which we call cross-fitting. This allows us to use a very broad set of ML predictive methods in solving the auxiliary and main prediction problems, such as random forest, lasso, ridge, deep neural nets, boosted trees, as well as various hybrids and aggregators of these methods.
△ Less
Submitted 12 December, 2017; v1 submitted 29 July, 2016;
originally announced August 2016.
-
On cross-validated Lasso in high dimensions
Authors:
Denis Chetverikov,
Zhipeng Liao,
Victor Chernozhukov
Abstract:
In this paper, we derive non-asymptotic error bounds for the Lasso estimator when the penalty parameter for the estimator is chosen using $K$-fold cross-validation. Our bounds imply that the cross-validated Lasso estimator has nearly optimal rates of convergence in the prediction, $L^2$, and $L^1$ norms. For example, we show that in the model with the Gaussian noise and under fairly general assump…
▽ More
In this paper, we derive non-asymptotic error bounds for the Lasso estimator when the penalty parameter for the estimator is chosen using $K$-fold cross-validation. Our bounds imply that the cross-validated Lasso estimator has nearly optimal rates of convergence in the prediction, $L^2$, and $L^1$ norms. For example, we show that in the model with the Gaussian noise and under fairly general assumptions on the candidate set of values of the penalty parameter, the estimation error of the cross-validated Lasso estimator converges to zero in the prediction norm with the $\sqrt{s\log p / n}\times \sqrt{\log(p n)}$ rate, where $n$ is the sample size of available data, $p$ is the number of covariates, and $s$ is the number of non-zero coefficients in the model. Thus, the cross-validated Lasso estimator achieves the fastest possible rate of convergence in the prediction norm up to a small logarithmic factor $\sqrt{\log(p n)}$, and similar conclusions apply for the convergence rate both in $L^2$ and in $L^1$ norms. Importantly, our results cover the case when $p$ is (potentially much) larger than $n$ and also allow for the case of non-Gaussian noise. Our paper therefore serves as a justification for the widely spread practice of using cross-validation as a method to choose the penalty parameter for the Lasso estimator.
△ Less
Submitted 6 February, 2020; v1 submitted 7 May, 2016;
originally announced May 2016.
-
Uniformly Valid Post-Regularization Confidence Regions for Many Functional Parameters in Z-Estimation Framework
Authors:
Alexandre Belloni,
Victor Chernozhukov,
Denis Chetverikov,
Ying Wei
Abstract:
In this paper we develop procedures to construct simultaneous confidence bands for $\tilde p$ potentially infinite-dimensional parameters after model selection for general moment condition models where $\tilde p$ is potentially much larger than the sample size of available data, $n$. This allows us to cover settings with functional response data where each of the $\tilde p$ parameters is a functio…
▽ More
In this paper we develop procedures to construct simultaneous confidence bands for $\tilde p$ potentially infinite-dimensional parameters after model selection for general moment condition models where $\tilde p$ is potentially much larger than the sample size of available data, $n$. This allows us to cover settings with functional response data where each of the $\tilde p$ parameters is a function. The procedure is based on the construction of score functions that satisfy certain orthogonality condition. The proposed simultaneous confidence bands rely on uniform central limit theorems for high-dimensional vectors (and not on Donsker arguments as we allow for $\tilde p \gg n$). To construct the bands, we employ a multiplier bootstrap procedure which is computationally efficient as it only involves resampling the estimated score functions (and does not require resolving the high-dimensional optimization problems). We formally apply the general theory to inference on regression coefficient process in the distribution regression model with a logistic link, where two implementations are analyzed in detail. Simulations and an application to real data are provided to help illustrate the applicability of the results.
△ Less
Submitted 3 February, 2019; v1 submitted 23 December, 2015;
originally announced December 2015.
-
Nonparametric instrumental variable estimation under monotonicity
Authors:
Denis Chetverikov,
Daniel Wilhelm
Abstract:
The ill-posedness of the inverse problem of recovering a regression function in a nonparametric instrumental variable model leads to estimators that may suffer from a very slow, logarithmic rate of convergence. In this paper, we show that restricting the problem to models with monotone regression functions and monotone instruments significantly weakens the ill-posedness of the problem. In stark co…
▽ More
The ill-posedness of the inverse problem of recovering a regression function in a nonparametric instrumental variable model leads to estimators that may suffer from a very slow, logarithmic rate of convergence. In this paper, we show that restricting the problem to models with monotone regression functions and monotone instruments significantly weakens the ill-posedness of the problem. In stark contrast to the existing literature, the presence of a monotone instrument implies boundedness of our measure of ill-posedness when restricted to the space of monotone functions. Based on this result we derive a novel non-asymptotic error bound for the constrained estimator that imposes monotonicity of the regression function. For a given sample size, the bound is independent of the degree of ill-posedness as long as the regression function is not too steep. As an implication, the bound allows us to show that the constrained estimator converges at a fast, polynomial rate, independently of the degree of ill-posedness, in a large, but slowly shrinking neighborhood of constant functions. Our simulation study demonstrates significant finite-sample performance gains from imposing monotonicity even when the regression function is rather far from being a constant. We apply the constrained estimator to the problem of estimating gasoline demand functions from U.S. data.
△ Less
Submitted 19 July, 2015;
originally announced July 2015.
-
Empirical and multiplier bootstraps for suprema of empirical processes of increasing complexity, and related Gaussian couplings
Authors:
Victor Chernozhukov,
Denis Chetverikov,
Kengo Kato
Abstract:
We derive strong approximations to the supremum of the non-centered empirical process indexed by a possibly unbounded VC-type class of functions by the suprema of the Gaussian and bootstrap processes. The bounds of these approximations are non-asymptotic, which allows us to work with classes of functions whose complexity increases with the sample size. The construction of couplings is not of the H…
▽ More
We derive strong approximations to the supremum of the non-centered empirical process indexed by a possibly unbounded VC-type class of functions by the suprema of the Gaussian and bootstrap processes. The bounds of these approximations are non-asymptotic, which allows us to work with classes of functions whose complexity increases with the sample size. The construction of couplings is not of the Hungarian type and is instead based on the Slepian-Stein methods and Gaussian comparison inequalities. The increasing complexity of classes of functions and non-centrality of the processes make the results useful for applications in modern nonparametric statistics (Giné and Nickl, 2015), in particular allowing us to study the power properties of nonparametric tests using Gaussian and bootstrap approximations.
△ Less
Submitted 6 September, 2015; v1 submitted 1 February, 2015;
originally announced February 2015.
-
Central Limit Theorems and Bootstrap in High Dimensions
Authors:
Victor Chernozhukov,
Denis Chetverikov,
Kengo Kato
Abstract:
This paper derives central limit and bootstrap theorems for probabilities that sums of centered high-dimensional random vectors hit hyperrectangles and sparsely convex sets. Specifically, we derive Gaussian and bootstrap approximations for probabilities $\Pr(n^{-1/2}\sum_{i=1}^n X_i\in A)$ where $X_1,\dots,X_n$ are independent random vectors in $\mathbb{R}^p$ and $A$ is a hyperrectangle, or, more…
▽ More
This paper derives central limit and bootstrap theorems for probabilities that sums of centered high-dimensional random vectors hit hyperrectangles and sparsely convex sets. Specifically, we derive Gaussian and bootstrap approximations for probabilities $\Pr(n^{-1/2}\sum_{i=1}^n X_i\in A)$ where $X_1,\dots,X_n$ are independent random vectors in $\mathbb{R}^p$ and $A$ is a hyperrectangle, or, more generally, a sparsely convex set, and show that the approximation error converges to zero even if $p=p_n\to \infty$ as $n \to \infty$ and $p \gg n$; in particular, $p$ can be as large as $O(e^{Cn^c})$ for some constants $c,C>0$. The result holds uniformly over all hyperrectangles, or more generally, sparsely convex sets, and does not require any restriction on the correlation structure among coordinates of $X_i$. Sparsely convex sets are sets that can be represented as intersections of many convex sets whose indicator functions depend only on a small subset of their arguments, with hyperrectangles being a special case.
△ Less
Submitted 8 March, 2016; v1 submitted 11 December, 2014;
originally announced December 2014.
-
Inference on causal and structural parameters using many moment inequalities
Authors:
Victor Chernozhukov,
Denis Chetverikov,
Kengo Kato
Abstract:
This paper considers the problem of testing many moment inequalities where the number of moment inequalities, denoted by $p$, is possibly much larger than the sample size $n$. There is a variety of economic applications where solving this problem allows to carry out inference on causal and structural parameters, a notable example is the market structure model of Ciliberto and Tamer (2009) where…
▽ More
This paper considers the problem of testing many moment inequalities where the number of moment inequalities, denoted by $p$, is possibly much larger than the sample size $n$. There is a variety of economic applications where solving this problem allows to carry out inference on causal and structural parameters, a notable example is the market structure model of Ciliberto and Tamer (2009) where $p=2^{m+1}$ with $m$ being the number of firms that could possibly enter the market. We consider the test statistic given by the maximum of $p$ Studentized (or $t$-type) inequality-specific statistics, and analyze various ways to compute critical values for the test statistic. Specifically, we consider critical values based upon (i) the union bound combined with a moderate deviation inequality for self-normalized sums, (ii) the multiplier and empirical bootstraps, and (iii) two-step and three-step variants of (i) and (ii) by incorporating the selection of uninformative inequalities that are far from being binding and a novel selection of weakly informative inequalities that are potentially binding but do not provide first order information. We prove validity of these methods, showing that under mild conditions, they lead to tests with the error in size decreasing polynomially in $n$ while allowing for $p$ being much larger than $n$, indeed $p$ can be of order $\exp (n^{c})$ for some $c > 0$. Importantly, all these results hold without any restriction on the correlation structure between $p$ Studentized statistics, and also hold uniformly with respect to suitably large classes of underlying distributions. Moreover, in the online supplement, we show validity of a test based on the block multiplier bootstrap in the case of dependent data under some general mixing conditions.
△ Less
Submitted 18 October, 2018; v1 submitted 29 December, 2013;
originally announced December 2013.
-
Anti-concentration and honest, adaptive confidence bands
Authors:
Victor Chernozhukov,
Denis Chetverikov,
Kengo Kato
Abstract:
Modern construction of uniform confidence bands for nonparametric densities (and other functions) often relies on the classical Smirnov-Bickel-Rosenblatt (SBR) condition; see, for example, Giné and Nickl [Probab. Theory Related Fields 143 (2009) 569-596]. This condition requires the existence of a limit distribution of an extreme value type for the supremum of a studentized empirical process (equi…
▽ More
Modern construction of uniform confidence bands for nonparametric densities (and other functions) often relies on the classical Smirnov-Bickel-Rosenblatt (SBR) condition; see, for example, Giné and Nickl [Probab. Theory Related Fields 143 (2009) 569-596]. This condition requires the existence of a limit distribution of an extreme value type for the supremum of a studentized empirical process (equivalently, for the supremum of a Gaussian process with the same covariance function as that of the studentized empirical process). The principal contribution of this paper is to remove the need for this classical condition. We show that a considerably weaker sufficient condition is derived from an anti-concentration property of the supremum of the approximating Gaussian process, and we derive an inequality leading to such a property for separable Gaussian processes. We refer to the new condition as a generalized SBR condition. Our new result shows that the supremum does not concentrate too fast around any value. We then apply this result to derive a Gaussian multiplier bootstrap procedure for constructing honest confidence bands for nonparametric density estimators (this result can be applied in other nonparametric problems as well). An essential advantage of our approach is that it applies generically even in those cases where the limit distribution of the supremum of the studentized empirical process does not exist (or is unknown). This is of particular importance in problems where resolution levels or other tuning parameters have been chosen in a data-driven fashion, which is needed for adaptive constructions of the confidence bands. Finally, of independent interest is our introduction of a new, practical version of Lepski's method, which computes the optimal, nonconservative resolution levels via a Gaussian multiplier bootstrap method.
△ Less
Submitted 23 September, 2014; v1 submitted 28 March, 2013;
originally announced March 2013.
-
Comparison and anti-concentration bounds for maxima of Gaussian random vectors
Authors:
Victor Chernozhukov,
Denis Chetverikov,
Kengo Kato
Abstract:
Slepian and Sudakov-Fernique type inequalities, which compare expectations of maxima of Gaussian random vectors under certain restrictions on the covariance matrices, play an important role in probability theory, especially in empirical process and extreme value theories. Here we give explicit comparisons of expectations of smooth functions and distribution functions of maxima of Gaussian random v…
▽ More
Slepian and Sudakov-Fernique type inequalities, which compare expectations of maxima of Gaussian random vectors under certain restrictions on the covariance matrices, play an important role in probability theory, especially in empirical process and extreme value theories. Here we give explicit comparisons of expectations of smooth functions and distribution functions of maxima of Gaussian random vectors without any restriction on the covariance matrices. We also establish an anti-concentration inequality for the maximum of a Gaussian random vector, which derives a useful upper bound on the Lévy concentration function for the Gaussian maximum. The bound is dimension-free and applies to vectors with arbitrary covariance matrices. This anti-concentration inequality plays a crucial role in establishing bounds on the Kolmogorov distance between maxima of Gaussian random vectors. These results have immediate applications in mathematical statistics. As an example of application, we establish a conditional multiplier central limit theorem for maxima of sums of independent random vectors where the dimension of the vectors is possibly much larger than the sample size.
△ Less
Submitted 12 April, 2014; v1 submitted 21 January, 2013;
originally announced January 2013.
-
Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors
Authors:
Victor Chernozhukov,
Denis Chetverikov,
Kengo Kato
Abstract:
We derive a Gaussian approximation result for the maximum of a sum of high-dimensional random vectors. Specifically, we establish conditions under which the distribution of the maximum is approximated by that of the maximum of a sum of the Gaussian random vectors with the same covariance matrices as the original vectors. This result applies when the dimension of random vectors ($p$) is large compa…
▽ More
We derive a Gaussian approximation result for the maximum of a sum of high-dimensional random vectors. Specifically, we establish conditions under which the distribution of the maximum is approximated by that of the maximum of a sum of the Gaussian random vectors with the same covariance matrices as the original vectors. This result applies when the dimension of random vectors ($p$) is large compared to the sample size ($n$); in fact, $p$ can be much larger than $n$, without restricting correlations of the coordinates of these vectors. We also show that the distribution of the maximum of a sum of the random vectors with unknown covariance matrices can be consistently estimated by the distribution of the maximum of a sum of the conditional Gaussian random vectors obtained by multiplying the original vectors with i.i.d. Gaussian multipliers. This is the Gaussian multiplier (or wild) bootstrap procedure. Here too, $p$ can be large or even much larger than $n$. These distributional approximations, either Gaussian or conditional Gaussian, yield a high-quality approximation to the distribution of the original maximum, often with approximation error decreasing polynomially in the sample size, and hence are of interest in many applications. We demonstrate how our Gaussian approximations and the multiplier bootstrap can be used for modern high-dimensional estimation, multiple hypothesis testing, and adaptive specification testing. All these results contain nonasymptotic bounds on approximation errors.
△ Less
Submitted 22 January, 2018; v1 submitted 31 December, 2012;
originally announced December 2012.
-
Gaussian approximation of suprema of empirical processes
Authors:
Victor Chernozhukov,
Denis Chetverikov,
Kengo Kato
Abstract:
This paper develops a new direct approach to approximating suprema of general empirical processes by a sequence of suprema of Gaussian processes, without taking the route of approximating whole empirical processes in the sup-norm. We prove an abstract approximation theorem applicable to a wide variety of statistical problems, such as construction of uniform confidence bands for functions. Notably,…
▽ More
This paper develops a new direct approach to approximating suprema of general empirical processes by a sequence of suprema of Gaussian processes, without taking the route of approximating whole empirical processes in the sup-norm. We prove an abstract approximation theorem applicable to a wide variety of statistical problems, such as construction of uniform confidence bands for functions. Notably, the bound in the main approximation theorem is nonasymptotic and the theorem allows for functions that index the empirical process to be unbounded and have entropy divergent with the sample size. The proof of the approximation theorem builds on a new coupling inequality for maxima of sums of random vectors, the proof of which depends on an effective use of Stein's method for normal approximation, and some new empirical process techniques. We study applications of this approximation theorem to local and series empirical processes arising in nonparametric estimation via kernel and series methods, where the classes of functions change with the sample size and are non-Donsker. Importantly, our new technique is able to prove the Gaussian approximation for the supremum type statistics under weak regularity conditions, especially concerning the bandwidth and the number of series functions, in those examples.
△ Less
Submitted 17 August, 2014; v1 submitted 31 December, 2012;
originally announced December 2012.
-
Testing Regression Monotonicity in Econometric Models
Authors:
Denis Chetverikov
Abstract:
Monotonicity is a key qualitative prediction of a wide array of economic models derived via robust comparative statics. It is therefore important to design effective and practical econometric methods for testing this prediction in empirical analysis. This paper develops a general nonparametric framework for testing monotonicity of a regression function. Using this framework, a broad class of new t…
▽ More
Monotonicity is a key qualitative prediction of a wide array of economic models derived via robust comparative statics. It is therefore important to design effective and practical econometric methods for testing this prediction in empirical analysis. This paper develops a general nonparametric framework for testing monotonicity of a regression function. Using this framework, a broad class of new tests is introduced, which gives an empirical researcher a lot of flexibility to incorporate ex ante information she might have. The paper also develops new methods for simulating critical values, which are based on the combination of a bootstrap procedure and new selection algorithms. These methods yield tests that have correct asymptotic size and are asymptotically nonconservative. It is also shown how to obtain an adaptive rate optimal test that has the best attainable rate of uniform consistency against models whose regression function has Lipschitz-continuous first-order derivatives and that automatically adapts to the unknown smoothness of the regression function. Simulations show that the power of the new tests in many cases significantly exceeds that of some prior tests, e.g. that of Ghosal, Sen, and Van der Vaart (2000). An application of the developed procedures to the dataset of Ellison and Ellison (2011) shows that there is some evidence of strategic entry deterrence in pharmaceutical industry where incumbents may use strategic investment to prevent generic entries when their patents expire.
△ Less
Submitted 3 December, 2013; v1 submitted 30 December, 2012;
originally announced December 2012.
-
Some New Asymptotic Theory for Least Squares Series: Pointwise and Uniform Results
Authors:
Alexandre Belloni,
Victor Chernozhukov,
Denis Chetverikov,
Kengo Kato
Abstract:
In applications it is common that the exact form of a conditional expectation is unknown and having flexible functional forms can lead to improvements. Series method offers that by approximating the unknown function based on $k$ basis functions, where $k$ is allowed to grow with the sample size $n$. We consider series estimators for the conditional mean in light of: (i) sharp LLNs for matrices der…
▽ More
In applications it is common that the exact form of a conditional expectation is unknown and having flexible functional forms can lead to improvements. Series method offers that by approximating the unknown function based on $k$ basis functions, where $k$ is allowed to grow with the sample size $n$. We consider series estimators for the conditional mean in light of: (i) sharp LLNs for matrices derived from the noncommutative Khinchin inequalities, (ii) bounds on the Lebesgue factor that controls the ratio between the $L^\infty$ and $L_2$-norms of approximation errors, (iii) maximal inequalities for processes whose entropy integrals diverge, and (iv) strong approximations to series-type processes.
These technical tools allow us to contribute to the series literature, specifically the seminal work of Newey (1997), as follows. First, we weaken the condition on the number $k$ of approximating functions used in series estimation from the typical $k^2/n \to 0$ to $k/n \to 0$, up to log factors, which was available only for spline series before. Second, we derive $L_2$ rates and pointwise central limit theorems results when the approximation error vanishes. Under an incorrectly specified model, i.e. when the approximation error does not vanish, analogous results are also shown. Third, under stronger conditions we derive uniform rates and functional central limit theorems that hold if the approximation error vanishes or not. That is, we derive the strong approximation for the entire estimate of the nonparametric function.
We derive uniform rates, Gaussian approximations, and uniform confidence bands for a wide collection of linear functionals of the conditional expectation function.
△ Less
Submitted 17 June, 2015; v1 submitted 3 December, 2012;
originally announced December 2012.
-
Adaptive Test of Conditional Moment Inequalities
Authors:
Denis Chetverikov
Abstract:
In this paper, I construct a new test of conditional moment inequalities, which is based on studentized kernel estimates of moment functions with many different values of the bandwidth parameter. The test automatically adapts to the unknown smoothness of moment functions and has uniformly correct asymptotic size. The test has high power in a large class of models with conditional moment inequaliti…
▽ More
In this paper, I construct a new test of conditional moment inequalities, which is based on studentized kernel estimates of moment functions with many different values of the bandwidth parameter. The test automatically adapts to the unknown smoothness of moment functions and has uniformly correct asymptotic size. The test has high power in a large class of models with conditional moment inequalities. Some existing tests have nontrivial power against n^{-1/2}-local alternatives in a certain class of these models whereas my method only allows for nontrivial testing against (n/\log n)^{-1/2}-local alternatives in this class. There exist, however, other classes of models with conditional moment inequalities where the mentioned tests have much lower power in comparison with the test developed in this paper.
△ Less
Submitted 5 January, 2012; v1 submitted 30 December, 2011;
originally announced January 2012.
-
Conditional Quantile Processes based on Series or Many Regressors
Authors:
Alexandre Belloni,
Victor Chernozhukov,
Denis Chetverikov,
Iván Fernández-Val
Abstract:
Quantile regression (QR) is a principal regression method for analyzing the impact of covariates on outcomes. The impact is described by the conditional quantile function and its functionals. In this paper we develop the nonparametric QR-series framework, covering many regressors as a special case, for performing inference on the entire conditional quantile function and its linear functionals. In…
▽ More
Quantile regression (QR) is a principal regression method for analyzing the impact of covariates on outcomes. The impact is described by the conditional quantile function and its functionals. In this paper we develop the nonparametric QR-series framework, covering many regressors as a special case, for performing inference on the entire conditional quantile function and its linear functionals. In this framework, we approximate the entire conditional quantile function by a linear combination of series terms with quantile-specific coefficients and estimate the function-valued coefficients from the data. We develop large sample theory for the QR-series coefficient process, namely we obtain uniform strong approximations to the QR-series coefficient process by conditionally pivotal and Gaussian processes. Based on these strong approximations, or couplings, we develop four resampling methods (pivotal, gradient bootstrap, Gaussian, and weighted bootstrap) that can be used for inference on the entire QR-series coefficient function.
We apply these results to obtain estimation and inference methods for linear functionals of the conditional quantile function, such as the conditional quantile function itself, its partial derivatives, average partial derivatives, and conditional average partial derivatives. Specifically, we obtain uniform rates of convergence and show how to use the four resampling methods mentioned above for inference on the functionals. All of the above results are for function-valued parameters, holding uniformly in both the quantile index and the covariate value, and covering the pointwise case as a by-product. We demonstrate the practical utility of these results with an example, where we estimate the price elasticity function and test the Slutsky condition of the individual demand for gasoline, as indexed by the individual unobserved propensity for gasoline consumption.
△ Less
Submitted 9 August, 2018; v1 submitted 30 May, 2011;
originally announced May 2011.