subscribe to arXiv mailings

Conditional Rank-Rank Regression

Authors: Victor Chernozhukov, Iván Fernández-Val, Jonas Meier, Aico van Vuuren, Francis Vella

Abstract: Rank-rank regressions are widely used in economic research to evaluate phenomena such as intergenerational income persistence or mobility. However, when covariates are incorporated to capture between-group persistence, the resulting coefficients can be difficult to interpret as such. We propose the conditional rank-rank regression, which uses conditional ranks instead of unconditional ranks, to me… ▽ More Rank-rank regressions are widely used in economic research to evaluate phenomena such as intergenerational income persistence or mobility. However, when covariates are incorporated to capture between-group persistence, the resulting coefficients can be difficult to interpret as such. We propose the conditional rank-rank regression, which uses conditional ranks instead of unconditional ranks, to measure average within-group income persistence. This property is analogous to that of the unconditional rank-rank regression that measures the overall income persistence. The difference between conditional and unconditional rank-rank regression coefficients therefore can measure between-group persistence. We develop a flexible estimation approach using distribution regression and establish a theoretical framework for large sample inference. An empirical study on intergenerational income mobility in Switzerland demonstrates the advantages of this approach. The study reveals stronger intergenerational persistence between fathers and sons compared to fathers and daughters, with the within-group persistence explaining 62% of the overall income persistence for sons and 52% for daughters. Families of small size or with highly educated fathers exhibit greater persistence in passing on their economic status. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 40 pages, 3 figures, 8 tables

MSC Class: 62P20

arXiv:2403.05850 [pdf, other]

Estimating Causal Effects of Discrete and Continuous Treatments with Binary Instruments

Authors: Victor Chernozhukov, Iván Fernández-Val, Sukjin Han, Kaspar Wüthrich

Abstract: We propose an instrumental variable framework for identifying and estimating average and quantile effects of discrete and continuous treatments with binary instruments. The basis of our approach is a local copula representation of the joint distribution of the potential outcomes and unobservables determining treatment assignment. This representation allows us to introduce an identifying assumption… ▽ More We propose an instrumental variable framework for identifying and estimating average and quantile effects of discrete and continuous treatments with binary instruments. The basis of our approach is a local copula representation of the joint distribution of the potential outcomes and unobservables determining treatment assignment. This representation allows us to introduce an identifying assumption, so-called copula invariance, that restricts the local dependence of the copula with respect to the treatment propensity. We show that copula invariance identifies treatment effects for the entire population and other subpopulations such as the treated. The identification results are constructive and lead to straightforward semiparametric estimation procedures based on distribution regression. An application to the effect of sleep on well-being uncovers interesting patterns of heterogeneity. △ Less

Submitted 9 March, 2024; originally announced March 2024.

arXiv:2403.02467 [pdf]

Applied Causal Inference Powered by ML and AI

Authors: Victor Chernozhukov, Christian Hansen, Nathan Kallus, Martin Spindler, Vasilis Syrgkanis

Abstract: An introduction to the emerging fusion of machine learning and causal inference. The book presents ideas from classical structural equation models (SEMs) and their modern AI equivalent, directed acyclical graphs (DAGs) and structural causal models (SCMs), and covers Double/Debiased Machine Learning methods to do inference in such models using modern predictive tools. An introduction to the emerging fusion of machine learning and causal inference. The book presents ideas from classical structural equation models (SEMs) and their modern AI equivalent, directed acyclical graphs (DAGs) and structural causal models (SCMs), and covers Double/Debiased Machine Learning methods to do inference in such models using modern predictive tools. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.04674 [pdf, other]

Hyperparameter Tuning for Causal Inference with Double Machine Learning: A Simulation Study

Authors: Philipp Bach, Oliver Schacht, Victor Chernozhukov, Sven Klaassen, Martin Spindler

Abstract: Proper hyperparameter tuning is essential for achieving optimal performance of modern machine learning (ML) methods in predictive tasks. While there is an extensive literature on tuning ML learners for prediction, there is only little guidance available on tuning ML learners for causal machine learning and how to select among different ML learners. In this paper, we empirically assess the relation… ▽ More Proper hyperparameter tuning is essential for achieving optimal performance of modern machine learning (ML) methods in predictive tasks. While there is an extensive literature on tuning ML learners for prediction, there is only little guidance available on tuning ML learners for causal machine learning and how to select among different ML learners. In this paper, we empirically assess the relationship between the predictive performance of ML methods and the resulting causal estimation based on the Double Machine Learning (DML) approach by Chernozhukov et al. (2018). DML relies on estimating so-called nuisance parameters by treating them as supervised learning problems and using them as plug-in estimates to solve for the (causal) parameter. We conduct an extensive simulation study using data from the 2019 Atlantic Causal Inference Conference Data Challenge. We provide empirical insights on the role of hyperparameter tuning and other practical decisions for causal estimation with DML. First, we assess the importance of data splitting schemes for tuning ML learners within Double Machine Learning. Second, we investigate how the choice of ML methods and hyperparameters, including recent AutoML frameworks, impacts the estimation performance for a causal parameter of interest. Third, we assess to what extent the choice of a particular causal model, as characterized by incorporated parametric assumptions, can be based on predictive performance metrics. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2402.01785 [pdf, other]

DoubleMLDeep: Estimation of Causal Effects with Multimodal Data

Authors: Sven Klaassen, Jan Teichert-Kluge, Philipp Bach, Victor Chernozhukov, Martin Spindler, Suhas Vijaykumar

Abstract: This paper explores the use of unstructured, multimodal data, namely text and images, in causal inference and treatment effect estimation. We propose a neural network architecture that is adapted to the double machine learning (DML) framework, specifically the partially linear model. An additional contribution of our paper is a new method to generate a semi-synthetic dataset which can be used to e… ▽ More This paper explores the use of unstructured, multimodal data, namely text and images, in causal inference and treatment effect estimation. We propose a neural network architecture that is adapted to the double machine learning (DML) framework, specifically the partially linear model. An additional contribution of our paper is a new method to generate a semi-synthetic dataset which can be used to evaluate the performance of causal effect estimation in the presence of text and images as confounders. The proposed methods and architectures are evaluated on the semi-synthetic dataset and compared to standard approaches, highlighting the potential benefit of using text and images directly in causal studies. Our findings have implications for researchers and practitioners in economics, marketing, finance, medicine and data science in general who are interested in estimating causal quantities using non-traditional data. △ Less

Submitted 1 February, 2024; originally announced February 2024.

MSC Class: 62; 91 ACM Class: I.2.0

arXiv:2402.00584 [pdf, ps, other]

Arellano-Bond LASSO Estimator for Dynamic Linear Panel Models

Authors: Victor Chernozhukov, Iván Fernández-Val, Chen Huang, Weining Wang

Abstract: The Arellano-Bond estimator is a fundamental method for dynamic panel data models, widely used in practice. However, the estimator is severely biased when the data's time series dimension $T$ is long due to the large degree of overidentification. We show that weak dependence along the panel's time series dimension naturally implies approximate sparsity of the most informative moment conditions, mo… ▽ More The Arellano-Bond estimator is a fundamental method for dynamic panel data models, widely used in practice. However, the estimator is severely biased when the data's time series dimension $T$ is long due to the large degree of overidentification. We show that weak dependence along the panel's time series dimension naturally implies approximate sparsity of the most informative moment conditions, motivating the following approach to remove the bias: First, apply LASSO to the cross-section data at each time period to construct most informative (and cross-fitted) instruments, using lagged values of suitable covariates. This step relies on approximate sparsity to select the most informative instruments. Second, apply a linear instrumental variable estimator after first differencing the dynamic structural equation using the constructed instruments. Under weak time series dependence, we show the new estimator is consistent and asymptotically normal under much weaker conditions on $T$'s growth than the Arellano-Bond estimator. Our theory covers models with high dimensional covariates, including multiple lags of the dependent variable, common in modern applications. We illustrate our approach by applying it to weekly county-level panel data from the United States to study opening K-12 schools and other mitigation policies' short and long-term effects on COVID-19's spread. △ Less

Submitted 16 October, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

arXiv:2307.04527 [pdf, other]

Automatic Debiased Machine Learning for Covariate Shifts

Authors: Victor Chernozhukov, Michael Newey, Whitney K Newey, Rahul Singh, Vasilis Srygkanis

Abstract: In this paper we address the problem of bias in machine learning of parameters following covariate shifts. Covariate shift occurs when the distribution of input features change between the training and deployment stages. Regularization and model selection associated with machine learning biases many parameter estimates. In this paper, we propose an automatic debiased machine learning approach to c… ▽ More In this paper we address the problem of bias in machine learning of parameters following covariate shifts. Covariate shift occurs when the distribution of input features change between the training and deployment stages. Regularization and model selection associated with machine learning biases many parameter estimates. In this paper, we propose an automatic debiased machine learning approach to correct for this bias under covariate shifts. The proposed approach leverages state-of-the-art techniques in debiased machine learning to debias estimators of policy and causal parameters when covariate shift is present. The debiasing is automatic in only relying on the parameter of interest and not requiring the form of the form of the bias. We show that our estimator is asymptotically normal as the sample size grows. Finally, we demonstrate the proposed method on a regression problem using a Monte-Carlo simulation. △ Less

Submitted 19 April, 2024; v1 submitted 10 July, 2023; originally announced July 2023.

arXiv:2305.00044 [pdf, other]

Hedonic Prices and Quality Adjusted Price Indices Powered by AI

Authors: Patrick Bajari, Zhihao Cen, Victor Chernozhukov, Manoj Manukonda, Suhas Vijaykumar, Jin Wang, Ramon Huerta, Junbo Li, Ling Leng, George Monokroussos, Shan Wan

Abstract: Accurate, real-time measurements of price index changes using electronic records are essential for tracking inflation and productivity in today's economic environment. We develop empirical hedonic models that can process large amounts of unstructured product data (text, images, prices, quantities) and output accurate hedonic price estimates and derived indices. To accomplish this, we generate abst… ▽ More Accurate, real-time measurements of price index changes using electronic records are essential for tracking inflation and productivity in today's economic environment. We develop empirical hedonic models that can process large amounts of unstructured product data (text, images, prices, quantities) and output accurate hedonic price estimates and derived indices. To accomplish this, we generate abstract product attributes, or ``features,'' from text descriptions and images using deep neural networks, and then use these attributes to estimate the hedonic price function. Specifically, we convert textual information about the product to numeric features using large language models based on transformers, trained or fine-tuned using product descriptions, and convert the product image to numeric features using a residual network model. To produce the estimated hedonic price function, we again use a multi-task neural network trained to predict a product's price in all time periods simultaneously. To demonstrate the performance of this approach, we apply the models to Amazon's data for first-party apparel sales and estimate hedonic prices. The resulting models have high predictive accuracy, with $R^2$ ranging from $80\%$ to $90\%$. Finally, we construct the AI-based hedonic Fisher price index, chained at the year-over-year frequency. We contrast the index with the CPI and other electronic indices. △ Less

Submitted 28 April, 2023; originally announced May 2023.

Comments: Revised CEMMAP Working Paper (CWP08/23)

arXiv:2301.07782 [pdf, other]

doi 10.1016/S0304-4076(03)00100-3

An MCMC Approach to Classical Estimation

Authors: Victor Chernozhukov, Han Hong

Abstract: This paper studies computationally and theoretically attractive estimators called the Laplace type estimators (LTE), which include means and quantiles of Quasi-posterior distributions defined as transformations of general (non-likelihood-based) statistical criterion functions, such as those in GMM, nonlinear IV, empirical likelihood, and minimum distance methods. The approach generates an alternat… ▽ More This paper studies computationally and theoretically attractive estimators called the Laplace type estimators (LTE), which include means and quantiles of Quasi-posterior distributions defined as transformations of general (non-likelihood-based) statistical criterion functions, such as those in GMM, nonlinear IV, empirical likelihood, and minimum distance methods. The approach generates an alternative to classical extremum estimation and also falls outside the parametric Bayesian approach. For example, it offers a new attractive estimation method for such important semi-parametric problems as censored and instrumental quantile, nonlinear GMM and value-at-risk models. The LTE's are computed using Markov Chain Monte Carlo methods, which help circumvent the computational curse of dimensionality. A large sample theory is obtained for regular cases. △ Less

Submitted 18 January, 2023; originally announced January 2023.

Comments: This is an archival version of the article "An MCMC approach to classical estimation", Journal of econometrics 115 (2), August 2003, pages 293-346. This version does not reflect the corrections made to the article during the publication process; it contains additional two remarks added, as indicated in the text. 62 pages, 7 figures

Journal ref: Journal of econometrics 115 (2), August 2003, pages 293-346

arXiv:2207.13081 [pdf, other]

Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

Authors: Masatoshi Uehara, Haruka Kiyohara, Andrew Bennett, Victor Chernozhukov, Nan Jiang, Nathan Kallus, Chengchun Shi, Wen Sun

Abstract: We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general function approximation. Existing methods such as sequential importance sampling estimators and fitted-Q evaluation suffer from the curse of horizon in POMDPs. To circumvent this problem, we develop a novel model-free OPE method by introducing future-dependent value functions that take future proxies as inputs.… ▽ More We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general function approximation. Existing methods such as sequential importance sampling estimators and fitted-Q evaluation suffer from the curse of horizon in POMDPs. To circumvent this problem, we develop a novel model-free OPE method by introducing future-dependent value functions that take future proxies as inputs. Future-dependent value functions play similar roles as classical value functions in fully-observable MDPs. We derive a new Bellman equation for future-dependent value functions as conditional moment equations that use history proxies as instrumental variables. We further propose a minimax learning method to learn future-dependent value functions using the new Bellman equation. We obtain the PAC result, which implies our OPE estimator is consistent as long as futures and histories contain sufficient information about latent states, and the Bellman completeness. Finally, we extend our methods to learning of dynamics and establish the connection between our approach and the well-known spectral learning methods in POMDPs. △ Less

Submitted 14 November, 2023; v1 submitted 26 July, 2022; originally announced July 2022.

Comments: This paper was accepted in NeurIPS 2023

arXiv:2205.09691 [pdf, other]

High-dimensional Data Bootstrap

Authors: Victor Chernozhukov, Denis Chetverikov, Kengo Kato, Yuta Koike

Abstract: This article reviews recent progress in high-dimensional bootstrap. We first review high-dimensional central limit theorems for distributions of sample mean vectors over the rectangles, bootstrap consistency results in high dimensions, and key techniques used to establish those results. We then review selected applications of high-dimensional bootstrap: construction of simultaneous confidence sets… ▽ More This article reviews recent progress in high-dimensional bootstrap. We first review high-dimensional central limit theorems for distributions of sample mean vectors over the rectangles, bootstrap consistency results in high dimensions, and key techniques used to establish those results. We then review selected applications of high-dimensional bootstrap: construction of simultaneous confidence sets for high-dimensional vector parameters, multiple hypothesis testing via stepdown, post-selection inference, intersection bounds for partially identified parameters, and inference on best policies in policy evaluation. Finally, we also comment on a couple of future research directions. △ Less

Submitted 19 May, 2022; originally announced May 2022.

Comments: 27 pages; review article

arXiv:2203.13887 [pdf, other]

Automatic Debiased Machine Learning for Dynamic Treatment Effects and General Nested Functionals

Authors: Victor Chernozhukov, Whitney Newey, Rahul Singh, Vasilis Syrgkanis

Abstract: We extend the idea of automated debiased machine learning to the dynamic treatment regime and more generally to nested functionals. We show that the multiply robust formula for the dynamic treatment regime with discrete treatments can be re-stated in terms of a recursive Riesz representer characterization of nested mean regressions. We then apply a recursive Riesz representer estimation learning a… ▽ More We extend the idea of automated debiased machine learning to the dynamic treatment regime and more generally to nested functionals. We show that the multiply robust formula for the dynamic treatment regime with discrete treatments can be re-stated in terms of a recursive Riesz representer characterization of nested mean regressions. We then apply a recursive Riesz representer estimation learning algorithm that estimates de-biasing corrections without the need to characterize how the correction terms look like, such as for instance, products of inverse probability weighting terms, as is done in prior work on doubly robust estimation in the dynamic regime. Our approach defines a sequence of loss minimization problems, whose minimizers are the mulitpliers of the de-biasing correction, hence circumventing the need for solving auxiliary propensity models and directly optimizing for the mean squared error of the target de-biasing correction. We provide further applications of our approach to estimation of dynamic discrete choice models and estimation of long-term effects with surrogates. △ Less

Submitted 20 June, 2023; v1 submitted 25 March, 2022; originally announced March 2022.

arXiv:2112.13398 [pdf, other]

Long Story Short: Omitted Variable Bias in Causal Machine Learning

Authors: Victor Chernozhukov, Carlos Cinelli, Whitney Newey, Amit Sharma, Vasilis Syrgkanis

Abstract: We develop a general theory of omitted variable bias for a wide range of common causal parameters, including (but not limited to) averages of potential outcomes, average treatment effects, average causal derivatives, and policy effects from covariate shifts. Our theory applies to nonparametric models, while naturally allowing for (semi-)parametric restrictions (such as partial linearity) when such… ▽ More We develop a general theory of omitted variable bias for a wide range of common causal parameters, including (but not limited to) averages of potential outcomes, average treatment effects, average causal derivatives, and policy effects from covariate shifts. Our theory applies to nonparametric models, while naturally allowing for (semi-)parametric restrictions (such as partial linearity) when such assumptions are made. We show how simple plausibility judgments on the maximum explanatory power of omitted variables are sufficient to bound the magnitude of the bias, thus facilitating sensitivity analysis in otherwise complex, nonlinear models. Finally, we provide flexible and efficient statistical inference methods for the bounds, which can leverage modern machine learning algorithms for estimation. These results allow empirical researchers to perform sensitivity analyses in a flexible class of machine-learned causal models using very simple, and interpretable, tools. We demonstrate the utility of our approach with two empirical examples. △ Less

Submitted 26 May, 2024; v1 submitted 26 December, 2021; originally announced December 2021.

Comments: This is an extended version of the paper was prepared for the NeurIPS-2021 Workshop "Causal Inference & Machine Learning: Why now?"; 55 pages; 10 figures

MSC Class: 62G

arXiv:2110.06136 [pdf, other]

A Response to Philippe Lemoine's Critique on our Paper "Causal Impact of Masks, Policies, Behavior on Early Covid-19 Pandemic in the U.S."

Authors: Victor Chernozhukov, Hiroyuki Kasahara, Paul Schrimpf

Abstract: Recently, Phillippe Lemoine posted a critique of our paper "Causal Impact of Masks, Policies, Behavior on Early Covid-19 Pandemic in the U.S." [arXiv:2005.14168] at his post titled "Lockdowns, econometrics and the art of putting lipstick on a pig." Although Lemoine's critique appears ideologically driven and overly emotional, some of his points are worth addressing. In particular, the sensitivity… ▽ More Recently, Phillippe Lemoine posted a critique of our paper "Causal Impact of Masks, Policies, Behavior on Early Covid-19 Pandemic in the U.S." [arXiv:2005.14168] at his post titled "Lockdowns, econometrics and the art of putting lipstick on a pig." Although Lemoine's critique appears ideologically driven and overly emotional, some of his points are worth addressing. In particular, the sensitivity of our estimation results for (i) including "masks in public spaces" and (ii) updating the data seems important critiques and, therefore, we decided to analyze the updated data ourselves. This note summarizes our findings from re-examining the updated data and responds to Phillippe Lemoine's critique on these two important points. We also briefly discuss other points Lemoine raised in his post. After analyzing the updated data, we find evidence that reinforces the conclusions reached in the original study. △ Less

Submitted 10 October, 2021; originally announced October 2021.

arXiv:2110.03031 [pdf, other]

RieszNet and ForestRiesz: Automatic Debiased Machine Learning with Neural Nets and Random Forests

Authors: Victor Chernozhukov, Whitney K. Newey, Victor Quintas-Martinez, Vasilis Syrgkanis

Abstract: Many causal and policy effects of interest are defined by linear functionals of high-dimensional or non-parametric regression functions. $\sqrt{n}$-consistent and asymptotically normal estimation of the object of interest requires debiasing to reduce the effects of regularization and/or model selection on the object of interest. Debiasing is typically achieved by adding a correction term to the pl… ▽ More Many causal and policy effects of interest are defined by linear functionals of high-dimensional or non-parametric regression functions. $\sqrt{n}$-consistent and asymptotically normal estimation of the object of interest requires debiasing to reduce the effects of regularization and/or model selection on the object of interest. Debiasing is typically achieved by adding a correction term to the plug-in estimator of the functional, which leads to properties such as semi-parametric efficiency, double robustness, and Neyman orthogonality. We implement an automatic debiasing procedure based on automatically learning the Riesz representation of the linear functional using Neural Nets and Random Forests. Our method only relies on black-box evaluation oracle access to the linear functional and does not require knowledge of its analytic form. We propose a multitasking Neural Net debiasing method with stochastic gradient descent minimization of a combined Riesz representer and regression loss, while sharing representation layers for the two functions. We also propose a Random Forest method which learns a locally linear representation of the Riesz function. Even though our method applies to arbitrary functionals, we experimentally find that it performs well compared to the state of art neural net based algorithm of Shi et al. (2019) for the case of the average treatment effect functional. We also evaluate our method on the problem of estimating average marginal effects with continuous treatments, using semi-synthetic data of gasoline price changes on gasoline demand. △ Less

Submitted 15 June, 2022; v1 submitted 6 October, 2021; originally announced October 2021.

Comments: Accepted for a long presentation at the ICML. Code available at https://github.com/victor5as/RieszLearning

arXiv:2107.02602 [pdf, ps, other]

Inference for Low-Rank Models

Authors: Victor Chernozhukov, Christian Hansen, Yuan Liao, Yinchu Zhu

Abstract: This paper studies inference in linear models with a high-dimensional parameter matrix that can be well-approximated by a ``spiked low-rank matrix.'' A spiked low-rank matrix has rank that grows slowly compared to its dimensions and nonzero singular values that diverge to infinity. We show that this framework covers a broad class of models of latent-variables which can accommodate matrix completio… ▽ More This paper studies inference in linear models with a high-dimensional parameter matrix that can be well-approximated by a ``spiked low-rank matrix.'' A spiked low-rank matrix has rank that grows slowly compared to its dimensions and nonzero singular values that diverge to infinity. We show that this framework covers a broad class of models of latent-variables which can accommodate matrix completion problems, factor models, varying coefficient models, and heterogeneous treatment effects. For inference, we apply a procedure that relies on an initial nuclear-norm penalized estimation step followed by two ordinary least squares regressions. We consider the framework of estimating incoherent eigenvectors and use a rotation argument to argue that the eigenspace estimation is asymptotically unbiased. Using this framework we show that our procedure provides asymptotically normal inference and achieves the semiparametric efficiency bound. We illustrate our framework by providing low-level conditions for its application in a treatment effects context where treatment assignment might be strongly dependent. △ Less

Submitted 2 January, 2023; v1 submitted 6 July, 2021; originally announced July 2021.

arXiv:2106.09762 [pdf, other]

Causal Bias Quantification for Continuous Treatments

Authors: Gianluca Detommaso, Michael Brückner, Philip Schulz, Victor Chernozhukov

Abstract: We extend the definition of the marginal causal effect to the continuous treatment setting and develop a novel characterization of causal bias in the framework of structural causal models. We prove that our derived bias expression is zero if, and only if, the causal effect is identifiable via covariate adjustment. We show that under some restrictions on the structural equations, the causal bias ca… ▽ More We extend the definition of the marginal causal effect to the continuous treatment setting and develop a novel characterization of causal bias in the framework of structural causal models. We prove that our derived bias expression is zero if, and only if, the causal effect is identifiable via covariate adjustment. We show that under some restrictions on the structural equations, the causal bias can be estimated efficiently and allows for causal regularization of predictive probabilistic models. We demonstrate the effectiveness of our method for causal bias quantification in various settings where (not) controlling for certain covariates would introduce causal bias. △ Less

Submitted 30 January, 2022; v1 submitted 17 June, 2021; originally announced June 2021.

arXiv:2105.15197 [pdf, ps, other]

A Simple and General Debiased Machine Learning Theorem with Finite Sample Guarantees

Authors: Victor Chernozhukov, Whitney K. Newey, Rahul Singh

Abstract: Debiased machine learning is a meta algorithm based on bias correction and sample splitting to calculate confidence intervals for functionals, i.e. scalar summaries, of machine learning algorithms. For example, an analyst may desire the confidence interval for a treatment effect estimated with a neural network. We provide a nonasymptotic debiased machine learning theorem that encompasses any globa… ▽ More Debiased machine learning is a meta algorithm based on bias correction and sample splitting to calculate confidence intervals for functionals, i.e. scalar summaries, of machine learning algorithms. For example, an analyst may desire the confidence interval for a treatment effect estimated with a neural network. We provide a nonasymptotic debiased machine learning theorem that encompasses any global or local functional of any machine learning algorithm that satisfies a few simple, interpretable conditions. Formally, we prove consistency, Gaussian approximation, and semiparametric efficiency by finite sample arguments. The rate of convergence is $n^{-1/2}$ for global functionals, and it degrades gracefully for local functionals. Our results culminate in a simple set of conditions that an analyst can use to translate modern learning theory rates into traditional statistical inference. The conditions reveal a general double robustness property for ill posed inverse problems. △ Less

Submitted 21 October, 2022; v1 submitted 31 May, 2021; originally announced May 2021.

Comments: Biometrika 2022

arXiv:2105.07424 [pdf, other]

Uniform Inference on High-dimensional Spatial Panel Networks

Authors: Victor Chernozhukov, Chen Huang, Weining Wang

Abstract: We propose employing a debiased-regularized, high-dimensional generalized method of moments (GMM) framework to perform inference on large-scale spatial panel networks. In particular, network structure with a flexible sparse deviation, which can be regarded either as latent or as misspecified from a predetermined adjacency matrix, is estimated using debiased machine learning approach. The theoretic… ▽ More We propose employing a debiased-regularized, high-dimensional generalized method of moments (GMM) framework to perform inference on large-scale spatial panel networks. In particular, network structure with a flexible sparse deviation, which can be regarded either as latent or as misspecified from a predetermined adjacency matrix, is estimated using debiased machine learning approach. The theoretical analysis establishes the consistency and asymptotic normality of our proposed estimator, taking into account general temporal and spatial dependency inherent in the data-generating processes. The dimensionality allowance in presence of dependency is discussed. A primary contribution of our study is the development of uniform inference theory that enables hypothesis testing on the parameters of interest, including zero or non-zero elements in the network structure. Additionally, the asymptotic properties for the estimator are derived for both linear and nonlinear moments. Simulations demonstrate superior performance of our proposed approach. Lastly, we apply our methodology to investigate the spatial network effect of stock returns. △ Less

Submitted 7 September, 2023; v1 submitted 16 May, 2021; originally announced May 2021.

arXiv:2105.04646 [pdf, other]

Deeply-Debiased Off-Policy Interval Estimation

Authors: Chengchun Shi, Runzhe Wan, Victor Chernozhukov, Rui Song

Abstract: Off-policy evaluation learns a target policy's value with a historical dataset generated by a different behavior policy. In addition to a point estimate, many applications would benefit significantly from having a confidence interval (CI) that quantifies the uncertainty of the point estimate. In this paper, we propose a novel deeply-debiasing procedure to construct an efficient, robust, and flexib… ▽ More Off-policy evaluation learns a target policy's value with a historical dataset generated by a different behavior policy. In addition to a point estimate, many applications would benefit significantly from having a confidence interval (CI) that quantifies the uncertainty of the point estimate. In this paper, we propose a novel deeply-debiasing procedure to construct an efficient, robust, and flexible CI on a target policy's value. Our method is justified by theoretical results and numerical experiments. A Python implementation of the proposed procedure is available at https://github.com/RunzheStat/D2OPE. △ Less

Submitted 7 June, 2021; v1 submitted 10 May, 2021; originally announced May 2021.

arXiv:2104.14737 [pdf, other]

Automatic Debiased Machine Learning via Riesz Regression

Authors: Victor Chernozhukov, Whitney K. Newey, Victor Quintas-Martinez, Vasilis Syrgkanis

Abstract: A variety of interesting parameters may depend on high dimensional regressions. Machine learning can be used to estimate such parameters. However estimators based on machine learners can be severely biased by regularization and/or model selection. Debiased machine learning uses Neyman orthogonal estimating equations to reduce such biases. Debiased machine learning generally requires estimation of… ▽ More A variety of interesting parameters may depend on high dimensional regressions. Machine learning can be used to estimate such parameters. However estimators based on machine learners can be severely biased by regularization and/or model selection. Debiased machine learning uses Neyman orthogonal estimating equations to reduce such biases. Debiased machine learning generally requires estimation of unknown Riesz representers. A primary innovation of this paper is to provide Riesz regression estimators of Riesz representers that depend on the parameter of interest, rather than explicit formulae, and that can employ any machine learner, including neural nets and random forests. End-to-end algorithms emerge where the researcher chooses the parameter of interest and the machine learner and the debiasing follows automatically. Another innovation here is debiased machine learners of parameters depending on generalized regressions, including high-dimensional generalized linear models. An empirical example of automatic debiased machine learning using neural nets is given. We find in Monte Carlo examples that automatic debiasing sometimes performs better than debiasing via inverse propensity scores and never worse. Finite sample mean square error bounds for Riesz regression estimators and asymptotic theory are also given. △ Less

Submitted 14 March, 2024; v1 submitted 29 April, 2021; originally announced April 2021.

Comments: arXiv admin note: text overlap with arXiv:1809.05224

MSC Class: 62D20; 62P20 (Primary); 62G20; 62J02 (Secondary)

arXiv:2104.03220 [pdf, other]

DoubleML -- An Object-Oriented Implementation of Double Machine Learning in Python

Authors: Philipp Bach, Victor Chernozhukov, Malte S. Kurz, Martin Spindler

Abstract: DoubleML is an open-source Python library implementing the double machine learning framework of Chernozhukov et al. (2018) for a variety of causal models. It contains functionalities for valid statistical inference on causal parameters when the estimation of nuisance parameters is based on machine learning methods. The object-oriented implementation of DoubleML provides a high flexibility in terms… ▽ More DoubleML is an open-source Python library implementing the double machine learning framework of Chernozhukov et al. (2018) for a variety of causal models. It contains functionalities for valid statistical inference on causal parameters when the estimation of nuisance parameters is based on machine learning methods. The object-oriented implementation of DoubleML provides a high flexibility in terms of model specifications and makes it easily extendable. The package is distributed under the MIT license and relies on core libraries from the scientific Python ecosystem: scikit-learn, numpy, pandas, scipy, statsmodels and joblib. Source code, documentation and an extensive user guide can be found at https://github.com/DoubleML/doubleml-for-py and https://docs.doubleml.org. △ Less

Submitted 20 December, 2021; v1 submitted 7 April, 2021; originally announced April 2021.

Comments: 6 pages, 2 figures

MSC Class: 62-04

Journal ref: Journal of Machine Learning Research 23 (53), 2022, 1-6

arXiv:2103.09603 [pdf, other]

doi 10.18637/jss.v108.i03

DoubleML -- An Object-Oriented Implementation of Double Machine Learning in R

Authors: Philipp Bach, Victor Chernozhukov, Malte S. Kurz, Martin Spindler, Sven Klaassen

Abstract: The R package DoubleML implements the double/debiased machine learning framework of Chernozhukov et al. (2018). It provides functionalities to estimate parameters in causal models based on machine learning methods. The double machine learning framework consist of three key ingredients: Neyman orthogonality, high-quality machine learning estimation and sample splitting. Estimation of nuisance compo… ▽ More The R package DoubleML implements the double/debiased machine learning framework of Chernozhukov et al. (2018). It provides functionalities to estimate parameters in causal models based on machine learning methods. The double machine learning framework consist of three key ingredients: Neyman orthogonality, high-quality machine learning estimation and sample splitting. Estimation of nuisance components can be performed by various state-of-the-art machine learning methods that are available in the mlr3 ecosystem. DoubleML makes it possible to perform inference in a variety of causal models, including partially linear and interactive regression models and their extensions to instrumental variable estimation. The object-oriented implementation of DoubleML enables a high flexibility for the model specification and makes it easily extendable. This paper serves as an introduction to the double machine learning framework and the R package DoubleML. In reproducible code examples with simulated and real data sets, we demonstrate how DoubleML users can perform valid inference based on machine learning methods. △ Less

Submitted 5 June, 2024; v1 submitted 17 March, 2021; originally announced March 2021.

Comments: 56 pages, 8 Figures, 1 Table; Updated version for DoubleML 1.0.0; Updated version due to changes in R package paradox (for parameter tuning with mlr3)

MSC Class: 62-04

Journal ref: Journal of Statistical Software 2024

arXiv:2102.12809 [pdf, other]

doi 10.1007/s00181-020-01919-y

Vector quantile regression and optimal transport, from theory to numerics

Authors: Guillaume Carlier, Victor Chernozhukov, Gwendoline De Bie, Alfred Galichon

Abstract: In this paper, we first revisit the Koenker and Bassett variational approach to (univariate) quantile regression, emphasizing its link with latent factor representations and correlation maximization problems. We then review the multivariate extension due to Carlier et al. (2016, 2017) which relates vector quantile regression to an optimal transport problem with mean independence constraints. We in… ▽ More In this paper, we first revisit the Koenker and Bassett variational approach to (univariate) quantile regression, emphasizing its link with latent factor representations and correlation maximization problems. We then review the multivariate extension due to Carlier et al. (2016, 2017) which relates vector quantile regression to an optimal transport problem with mean independence constraints. We introduce an entropic regularization of this problem, implement a gradient descent numerical method and illustrate its feasibility on univariate and bivariate examples. △ Less

Submitted 25 February, 2021; originally announced February 2021.

Comments: 35 pages, 19 figures, 4 tables. arXiv admin note: text overlap with arXiv:1610.06833

Journal ref: Empirical Economics (2020)

arXiv:2102.10453 [pdf, other]

doi 10.1073/pnas.2103420118

The Association of Opening K-12 Schools with the Spread of COVID-19 in the United States: County-Level Panel Data Analysis

Authors: Victor Chernozhukov, Hiroyuki Kasahara, Paul Schrimpf

Abstract: This paper empirically examines how the opening of K-12 schools and colleges is associated with the spread of COVID-19 using county-level panel data in the United States. Using data on foot traffic and K-12 school opening plans, we analyze how an increase in visits to schools and opening schools with different teaching methods (in-person, hybrid, and remote) is related to the 2-weeks forward growt… ▽ More This paper empirically examines how the opening of K-12 schools and colleges is associated with the spread of COVID-19 using county-level panel data in the United States. Using data on foot traffic and K-12 school opening plans, we analyze how an increase in visits to schools and opening schools with different teaching methods (in-person, hybrid, and remote) is related to the 2-weeks forward growth rate of confirmed COVID-19 cases. Our debiased panel data regression analysis with a set of county dummies, interactions of state and week dummies, and other controls shows that an increase in visits to both K-12 schools and colleges is associated with a subsequent increase in case growth rates. The estimates indicate that fully opening K-12 schools with in-person learning is associated with a 5 (SE = 2) percentage points increase in the growth rate of cases. We also find that the positive association of K-12 school visits or in-person school openings with case growth is stronger for counties that do not require staff to wear masks at schools. These results have a causal interpretation in a structural model with unobserved county and time confounders. Sensitivity analysis shows that the baseline results are robust to timing assumptions and alternative specifications. △ Less

Submitted 15 June, 2021; v1 submitted 20 February, 2021; originally announced February 2021.

arXiv:2101.00009 [pdf, other]

Adversarial Estimation of Riesz Representers

Authors: Victor Chernozhukov, Whitney Newey, Rahul Singh, Vasilis Syrgkanis

Abstract: Many causal parameters are linear functionals of an underlying regression. The Riesz representer is a key component in the asymptotic variance of a semiparametrically estimated linear functional. We propose an adversarial framework to estimate the Riesz representer using general function spaces. We prove a nonasymptotic mean square rate in terms of an abstract quantity called the critical radius,… ▽ More Many causal parameters are linear functionals of an underlying regression. The Riesz representer is a key component in the asymptotic variance of a semiparametrically estimated linear functional. We propose an adversarial framework to estimate the Riesz representer using general function spaces. We prove a nonasymptotic mean square rate in terms of an abstract quantity called the critical radius, then specialize it for neural networks, random forests, and reproducing kernel Hilbert spaces as leading cases. Our estimators are highly compatible with targeted and debiased machine learning with sample splitting; our guarantees directly verify general conditions for inference that allow mis-specification. We also use our guarantees to prove inference without sample splitting, based on stability or complexity. Our estimators achieve nominal coverage in highly nonlinear simulations where some previous methods break down. They shed new light on the heterogeneous effects of matching grants. △ Less

Submitted 26 April, 2024; v1 submitted 30 December, 2020; originally announced January 2021.

arXiv:2012.09513 [pdf, ps, other]

Nearly optimal central limit theorem and bootstrap approximations in high dimensions

Authors: Victor Chernozhukov, Denis Chetverikov, Yuta Koike

Abstract: In this paper, we derive new, nearly optimal bounds for the Gaussian approximation to scaled averages of $n$ independent high-dimensional centered random vectors $X_1,\dots,X_n$ over the class of rectangles in the case when the covariance matrix of the scaled average is non-degenerate. In the case of bounded $X_i$'s, the implied bound for the Kolmogorov distance between the distribution of the sca… ▽ More In this paper, we derive new, nearly optimal bounds for the Gaussian approximation to scaled averages of $n$ independent high-dimensional centered random vectors $X_1,\dots,X_n$ over the class of rectangles in the case when the covariance matrix of the scaled average is non-degenerate. In the case of bounded $X_i$'s, the implied bound for the Kolmogorov distance between the distribution of the scaled average and the Gaussian vector takes the form $$C (B^2_n \log^3 d/n)^{1/2} \log n,$$ where $d$ is the dimension of the vectors and $B_n$ is a uniform envelope constant on components of $X_i$'s. This bound is sharp in terms of $d$ and $B_n$, and is nearly (up to $\log n$) sharp in terms of the sample size $n$. In addition, we show that similar bounds hold for the multiplier and empirical bootstrap approximations. Moreover, we establish bounds that allow for unbounded $X_i$'s, formulated solely in terms of moments of $X_i$'s. Finally, we demonstrate that the bounds can be further improved in some special smooth and zero-skewness cases. △ Less

Submitted 12 May, 2021; v1 submitted 17 December, 2020; originally announced December 2020.

Comments: 60 pages. We corrected a mistake in v1. Lemmas 6.1-6.3 are reformulated for general rectangles

MSC Class: 60F05; 62E17

arXiv:2011.01092 [pdf, other]

Insights from Optimal Pandemic Shielding in a Multi-Group SEIR Framework

Authors: Philipp Bach, Victor Chernozhukov, Martin Spindler

Abstract: The COVID-19 pandemic constitutes one of the largest threats in recent decades to the health and economic welfare of populations globally. In this paper, we analyze different types of policy measures designed to fight the spread of the virus and minimize economic losses. Our analysis builds on a multi-group SEIR model, which extends the multi-group SIR model introduced by Acemoglu et al.~(2020). W… ▽ More The COVID-19 pandemic constitutes one of the largest threats in recent decades to the health and economic welfare of populations globally. In this paper, we analyze different types of policy measures designed to fight the spread of the virus and minimize economic losses. Our analysis builds on a multi-group SEIR model, which extends the multi-group SIR model introduced by Acemoglu et al.~(2020). We adjust the underlying social interaction patterns and consider an extended set of policy measures. The model is calibrated for Germany. Despite the trade-off between COVID-19 prevention and economic activity that is inherent to shielding policies, our results show that efficiency gains can be achieved by targeting such policies towards different age groups. Alternative policies such as physical distancing can be employed to reduce the degree of targeting and the intensity and duration of shielding. Our results show that a comprehensive approach that combines multiple policy measures simultaneously can effectively mitigate population mortality and economic harm. △ Less

Submitted 2 November, 2020; originally announced November 2020.

Comments: 39 pages, 23 figures

arXiv:2009.00436 [pdf, ps, other]

Instrumental Variable Quantile Regression

Authors: Victor Chernozhukov, Christian Hansen, Kaspar Wuthrich

Abstract: This chapter reviews the instrumental variable quantile regression model of Chernozhukov and Hansen (2005). We discuss the key conditions used for identification of structural quantile effects within this model which include the availability of instruments and a restriction on the ranks of structural disturbances. We outline several approaches to obtaining point estimates and performing statistica… ▽ More This chapter reviews the instrumental variable quantile regression model of Chernozhukov and Hansen (2005). We discuss the key conditions used for identification of structural quantile effects within this model which include the availability of instruments and a restriction on the ranks of structural disturbances. We outline several approaches to obtaining point estimates and performing statistical inference for model parameters. Finally, we point to possible directions for future research. △ Less

Submitted 28 August, 2020; originally announced September 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:1303.7050

Journal ref: Chapter 9 in: Chernozhukov, V., He, X., Koenker, R., Peng, L. (Eds.), Handbook of Quantile Regression. CRC Chapman-Hall, 2017

arXiv:2005.14168 [pdf, other]

doi 10.1016/j.jeconom.2020.09.003

Causal Impact of Masks, Policies, Behavior on Early Covid-19 Pandemic in the U.S

Authors: Victor Chernozhukov, Hiroyuki Kasaha, Paul Schrimpf

Abstract: This paper evaluates the dynamic impact of various policies adopted by US states on the growth rates of confirmed Covid-19 cases and deaths as well as social distancing behavior measured by Google Mobility Reports, where we take into consideration people's voluntarily behavioral response to new information of transmission risks. Our analysis finds that both policies and information on transmission… ▽ More This paper evaluates the dynamic impact of various policies adopted by US states on the growth rates of confirmed Covid-19 cases and deaths as well as social distancing behavior measured by Google Mobility Reports, where we take into consideration people's voluntarily behavioral response to new information of transmission risks. Our analysis finds that both policies and information on transmission risks are important determinants of Covid-19 cases and deaths and shows that a change in policies explains a large fraction of observed changes in social distancing behavior. Our counterfactual experiments suggest that nationally mandating face masks for employees on April 1st could have reduced the growth rate of cases and deaths by more than 10 percentage points in late April, and could have led to as much as 17 to 55 percent less deaths nationally by the end of May, which roughly translates into 17 to 55 thousand saved lives. Our estimates imply that removing non-essential business closures (while maintaining school closures, restrictions on movie theaters and restaurants) could have led to -20 to 60 percent more cases and deaths by the end of May. We also find that, without stay-at-home orders, cases would have been larger by 25 to 170 percent, which implies that 0.5 to 3.4 million more Americans could have been infected if stay-at-home orders had not been implemented. Finally, not having implemented any policies could have led to at least a 7 fold increase with an uninformative upper bound in cases (and deaths) by the end of May in the US, with considerable uncertainty over the effects of school closures, which had little cross-sectional variation. △ Less

Submitted 19 October, 2020; v1 submitted 28 May, 2020; originally announced May 2020.

Journal ref: Journal of Econometrics (2020)

arXiv:1912.12213 [pdf, other]

Minimax Semiparametric Learning With Approximate Sparsity

Authors: Jelena Bradic, Victor Chernozhukov, Whitney K. Newey, Yinchu Zhu

Abstract: This paper is about the feasibility and means of root-n consistently estimating linear, mean-square continuous functionals of a high dimensional, approximately sparse regression. Such objects include a wide variety of interesting parameters such as regression coefficients, average derivatives, and the average treatment effect. We give lower bounds on the convergence rate of estimators of a regress… ▽ More This paper is about the feasibility and means of root-n consistently estimating linear, mean-square continuous functionals of a high dimensional, approximately sparse regression. Such objects include a wide variety of interesting parameters such as regression coefficients, average derivatives, and the average treatment effect. We give lower bounds on the convergence rate of estimators of a regression slope and an average derivative and find that these bounds are substantially larger than in a low dimensional, semiparametric setting. We also give debiased machine learners that are root-n consistent under either a minimal approximate sparsity condition or rate double robustness. These estimators improve on existing estimators in being root-n consistent under more general conditions that previously known. △ Less

Submitted 8 August, 2022; v1 submitted 27 December, 2019; originally announced December 2019.

arXiv:1912.10529 [pdf, ps, other]

Improved Central Limit Theorem and bootstrap approximations in high dimensions

Authors: Victor Chernozhukov, Denis Chetverikov, Kengo Kato, Yuta Koike

Abstract: This paper deals with the Gaussian and bootstrap approximations to the distribution of the max statistic in high dimensions. This statistic takes the form of the maximum over components of the sum of independent random vectors and its distribution plays a key role in many high-dimensional econometric problems. Using a novel iterative randomized Lindeberg method, the paper derives new bounds for th… ▽ More This paper deals with the Gaussian and bootstrap approximations to the distribution of the max statistic in high dimensions. This statistic takes the form of the maximum over components of the sum of independent random vectors and its distribution plays a key role in many high-dimensional econometric problems. Using a novel iterative randomized Lindeberg method, the paper derives new bounds for the distributional approximation errors. These new bounds substantially improve upon existing ones and simultaneously allow for a larger class of bootstrap methods. △ Less

Submitted 29 May, 2022; v1 submitted 22 December, 2019; originally announced December 2019.

Comments: 63 pages

arXiv:1909.07889 [pdf, other]

Distributional conformal prediction

Authors: Victor Chernozhukov, Kaspar Wüthrich, Yinchu Zhu

Abstract: We propose a robust method for constructing conditionally valid prediction intervals based on models for conditional distributions such as quantile and distribution regression. Our approach can be applied to important prediction problems including cross-sectional prediction, k-step-ahead forecasts, synthetic controls and counterfactual prediction, and individual treatment effects prediction. Our m… ▽ More We propose a robust method for constructing conditionally valid prediction intervals based on models for conditional distributions such as quantile and distribution regression. Our approach can be applied to important prediction problems including cross-sectional prediction, k-step-ahead forecasts, synthetic controls and counterfactual prediction, and individual treatment effects prediction. Our method exploits the probability integral transform and relies on permuting estimated ranks. Unlike regression residuals, ranks are independent of the predictors, allowing us to construct conditionally valid prediction intervals under heteroskedasticity. We establish approximate conditional validity under consistent estimation and provide approximate unconditional validity under model misspecification, overfitting, and with time series data. We also propose a simple "shape" adjustment of our baseline method that yields optimal prediction intervals. △ Less

Submitted 21 August, 2021; v1 submitted 17 September, 2019; originally announced September 2019.

Journal ref: PNAS November 30, 2021 118 (48) e2107794118

arXiv:1909.05782 [pdf, ps, other]

Fast Algorithms for the Quantile Regression Process

Authors: Victor Chernozhukov, Iván Fernández-Val, Blaise Melly

Abstract: The widespread use of quantile regression methods depends crucially on the existence of fast algorithms. Despite numerous algorithmic improvements, the computation time is still non-negligible because researchers often estimate many quantile regressions and use the bootstrap for inference. We suggest two new fast algorithms for the estimation of a sequence of quantile regressions at many quantile… ▽ More The widespread use of quantile regression methods depends crucially on the existence of fast algorithms. Despite numerous algorithmic improvements, the computation time is still non-negligible because researchers often estimate many quantile regressions and use the bootstrap for inference. We suggest two new fast algorithms for the estimation of a sequence of quantile regressions at many quantile indexes. The first algorithm applies the preprocessing idea of Portnoy and Koenker (1997) but exploits a previously estimated quantile regression to guess the sign of the residuals. This step allows for a reduction of the effective sample size. The second algorithm starts from a previously estimated quantile regression at a similar quantile index and updates it using a single Newton-Raphson iteration. The first algorithm is exact, while the second is only asymptotically equivalent to the traditional quantile regression estimator. We also apply the preprocessing idea to the bootstrap by using the sample estimates to guess the sign of the residuals in the bootstrap sample. Simulations show that our new algorithms provide very large improvements in computation time without significant (if any) cost in the quality of the estimates. For instance, we divide by 100 the time required to estimate 99 quantile regressions with 20 regressors and 50,000 observations. △ Less

Submitted 6 April, 2020; v1 submitted 12 September, 2019; originally announced September 2019.

Comments: 29 pages, 3 figures, 4 tables; for associated Stata package, see https://sites.google.com/site/blaisemelly/home/computer-programs/fast

arXiv:1909.00836 [pdf, other]

SortedEffects: Sorted Causal Effects in R

Authors: Shuowen Chen, Victor Chernozhukov, Iván Fernández-Val, Ye Luo

Abstract: Chernozhukov et al. (2018) proposed the sorted effect method for nonlinear regression models. This method consists of reporting percentiles of the partial effects in addition to the average commonly used to summarize the heterogeneity in the partial effects. They also proposed to use the sorted effects to carry out classification analysis where the observational units are classified as most and le… ▽ More Chernozhukov et al. (2018) proposed the sorted effect method for nonlinear regression models. This method consists of reporting percentiles of the partial effects in addition to the average commonly used to summarize the heterogeneity in the partial effects. They also proposed to use the sorted effects to carry out classification analysis where the observational units are classified as most and least affected if their causal effects are above or below some tail sorted effects. The R package SortedEffects implements the estimation and inference methods therein and provides tools to visualize the results. This vignette serves as an introduction to the package and displays basic functionality of the functions within. △ Less

Submitted 6 November, 2019; v1 submitted 2 September, 2019; originally announced September 2019.

Comments: 15 pages, 6 figures, 8 tables

MSC Class: 62-07; 62E20

arXiv:1908.09173 [pdf, ps, other]

Welfare Analysis in Dynamic Models

Authors: Victor Chernozhukov, Whitney Newey, Vira Semenova

Abstract: This paper provides welfare metrics for dynamic choice. We give estimation and inference methods for functions of the expected value of dynamic choice. These parameters include average value by group, average derivatives with respect to endowments, and structural decompositions. The example of dynamic discrete choice is considered. We give dual and doubly robust representations of these parameters… ▽ More This paper provides welfare metrics for dynamic choice. We give estimation and inference methods for functions of the expected value of dynamic choice. These parameters include average value by group, average derivatives with respect to endowments, and structural decompositions. The example of dynamic discrete choice is considered. We give dual and doubly robust representations of these parameters. A least squares estimator of the dynamic Riesz representer for the parameter of interest is given. Debiased machine learners are provided and asymptotic theory given. △ Less

Submitted 14 October, 2024; v1 submitted 24 August, 2019; originally announced August 2019.

arXiv:1905.10116 [pdf, other]

Semi-Parametric Efficient Policy Learning with Continuous Actions

Authors: Mert Demirer, Vasilis Syrgkanis, Greg Lewis, Victor Chernozhukov

Abstract: We consider off-policy evaluation and optimization with continuous action spaces. We focus on observational data where the data collection policy is unknown and needs to be estimated. We take a semi-parametric approach where the value function takes a known parametric form in the treatment, but we are agnostic on how it depends on the observed contexts. We propose a doubly robust off-policy estima… ▽ More We consider off-policy evaluation and optimization with continuous action spaces. We focus on observational data where the data collection policy is unknown and needs to be estimated. We take a semi-parametric approach where the value function takes a known parametric form in the treatment, but we are agnostic on how it depends on the observed contexts. We propose a doubly robust off-policy estimate for this setting and show that off-policy optimization based on this estimate is robust to estimation errors of the policy function or the regression model. Our results also apply if the model does not satisfy our semi-parametric form, but rather we measure regret in terms of the best projection of the true value function to this functional space. Our work extends prior approaches of policy optimization from observational data that only considered discrete actions. We provide an experimental evaluation of our method in a synthetic data example motivated by optimal personalized pricing and costly resource allocation. △ Less

Submitted 20 July, 2019; v1 submitted 24 May, 2019; originally announced May 2019.

arXiv:1901.03821 [pdf, ps, other]

Mastering Panel 'Metrics: Causal Impact of Democracy on Growth

Authors: Shuowen Chen, Victor Chernozhukov, Iván Fernández-Val

Abstract: The relationship between democracy and economic growth is of long-standing interest. We revisit the panel data analysis of this relationship by Acemoglu, Naidu, Restrepo and Robinson (forthcoming) using state of the art econometric methods. We argue that this and lots of other panel data settings in economics are in fact high-dimensional, resulting in principal estimators -- the fixed effects (FE)… ▽ More The relationship between democracy and economic growth is of long-standing interest. We revisit the panel data analysis of this relationship by Acemoglu, Naidu, Restrepo and Robinson (forthcoming) using state of the art econometric methods. We argue that this and lots of other panel data settings in economics are in fact high-dimensional, resulting in principal estimators -- the fixed effects (FE) and Arellano-Bond (AB) estimators -- to be biased to the degree that invalidates statistical inference. We can however remove these biases by using simple analytical and sample-splitting methods, and thereby restore valid statistical inference. We find that the debiased FE and AB estimators produce substantially higher estimates of the long-run effect of democracy on growth, providing even stronger support for the key hypothesis in Acemoglu, Naidu, Restrepo and Robinson (forthcoming). Given the ubiquitous nature of panel data, we conclude that the use of debiased panel data estimators should substantially improve the quality of empirical inference in economics. △ Less

Submitted 12 January, 2019; originally announced January 2019.

Comments: 8 pages, 2 tables, includes supplementary appendix

MSC Class: 62P20

arXiv:1812.10820 [pdf, other]

A $t$-test for synthetic controls

Authors: Victor Chernozhukov, Kaspar Wuthrich, Yinchu Zhu

Abstract: We propose a practical and robust method for making inferences on average treatment effects estimated by synthetic controls. We develop a $K$-fold cross-fitting procedure for bias correction. To avoid the difficult estimation of the long-run variance, inference is based on a self-normalized $t$-statistic, which has an asymptotically pivotal $t$-distribution. Our $t$-test is easy to implement, prov… ▽ More We propose a practical and robust method for making inferences on average treatment effects estimated by synthetic controls. We develop a $K$-fold cross-fitting procedure for bias correction. To avoid the difficult estimation of the long-run variance, inference is based on a self-normalized $t$-statistic, which has an asymptotically pivotal $t$-distribution. Our $t$-test is easy to implement, provably robust against misspecification, and valid with stationary and non-stationary data. It demonstrates an excellent small sample performance in application-based simulations and performs well relative to alternative methods. We illustrate the usefulness of the $t$-test by revisiting the effect of carbon taxes on emissions. △ Less

Submitted 17 January, 2024; v1 submitted 27 December, 2018; originally announced December 2018.

arXiv:1812.08089 [pdf, ps, other]

Inference for Heterogeneous Effects using Low-Rank Estimation of Factor Slopes

Authors: Victor Chernozhukov, Christian Hansen, Yuan Liao, Yinchu Zhu

Abstract: We study a panel data model with general heterogeneous effects where slopes are allowed to vary across both individuals and over time. The key dimension reduction assumption we employ is that the heterogeneous slopes can be expressed as having a factor structure so that the high-dimensional slope matrix is low-rank and can thus be estimated using low-rank regularized regression. We provide a simpl… ▽ More We study a panel data model with general heterogeneous effects where slopes are allowed to vary across both individuals and over time. The key dimension reduction assumption we employ is that the heterogeneous slopes can be expressed as having a factor structure so that the high-dimensional slope matrix is low-rank and can thus be estimated using low-rank regularized regression. We provide a simple multi-step estimation procedure for the heterogeneous effects. The procedure makes use of sample-splitting and orthogonalization to accommodate inference following the use of penalized low-rank estimation. We formally verify that the resulting estimator is asymptotically normal allowing simple construction of inferential statements for {the individual-time-specific effects and for cross-sectional averages of these effects}. We illustrate the proposed method in simulation experiments and by estimating the effect of the minimum wage on employment. △ Less

Submitted 4 September, 2019; v1 submitted 19 December, 2018; originally announced December 2018.

arXiv:1812.04345 [pdf, other]

Closing the U.S. gender wage gap requires understanding its heterogeneity

Authors: Philipp Bach, Victor Chernozhukov, Martin Spindler

Abstract: In 2016, the majority of full-time employed women in the U.S. earned significantly less than comparable men. The extent to which women were affected by gender inequality in earnings, however, depended greatly on socio-economic characteristics, such as marital status or educational attainment. In this paper, we analyzed data from the 2016 American Community Survey using a high-dimensional wage regr… ▽ More In 2016, the majority of full-time employed women in the U.S. earned significantly less than comparable men. The extent to which women were affected by gender inequality in earnings, however, depended greatly on socio-economic characteristics, such as marital status or educational attainment. In this paper, we analyzed data from the 2016 American Community Survey using a high-dimensional wage regression and applying double lasso to quantify heterogeneity in the gender wage gap. We found that the gap varied substantially across women and was driven primarily by marital status, having children at home, race, occupation, industry, and educational attainment. We recommend that policy makers use these insights to design policies that will reduce discrimination and unequal pay more effectively. △ Less

Submitted 7 June, 2021; v1 submitted 11 December, 2018; originally announced December 2018.

Comments: Main text: 8 pages, 3 figures; Supplementary Material available online

arXiv:1811.11603 [pdf, other]

Distribution Regression with Sample Selection, with an Application to Wage Decompositions in the UK

Authors: Victor Chernozhukov, Iván Fernández-Val, Siyi Luo

Abstract: We develop a distribution regression model under endogenous sample selection. This model is a semi-parametric generalization of the Heckman selection model. It accommodates much richer effects of the covariates on outcome distribution and patterns of heterogeneity in the selection process, and allows for drastic departures from the Gaussian error structure, while maintaining the same level tractab… ▽ More We develop a distribution regression model under endogenous sample selection. This model is a semi-parametric generalization of the Heckman selection model. It accommodates much richer effects of the covariates on outcome distribution and patterns of heterogeneity in the selection process, and allows for drastic departures from the Gaussian error structure, while maintaining the same level tractability as the classical model. The model applies to continuous, discrete and mixed outcomes. We provide identification, estimation, and inference methods, and apply them to obtain wage decomposition for the UK. Here we decompose the difference between the male and female wage distributions into composition, wage structure, selection structure, and selection sorting effects. After controlling for endogenous employment selection, we still find substantial gender wage gap -- ranging from 21% to 40% throughout the (latent) offered wage distribution that is not explained by composition. We also uncover positive sorting for single men and negative sorting for married women that accounts for a substantive fraction of the gender wage gap at the top of the distribution. △ Less

Submitted 18 December, 2023; v1 submitted 28 November, 2018; originally announced November 2018.

Comments: 86 pages, 4 tables, 40 figures, includes supplement

MSC Class: 62P20; 91B40

arXiv:1809.05224 [pdf, ps, other]

Automatic Debiased Machine Learning of Causal and Structural Effects

Authors: Victor Chernozhukov, Whitney K Newey, Rahul Singh

Abstract: Many causal and structural effects depend on regressions. Examples include policy effects, average derivatives, regression decompositions, average treatment effects, causal mediation, and parameters of economic structural models. The regressions may be high dimensional, making machine learning useful. Plugging machine learners into identifying equations can lead to poor inference due to bias from… ▽ More Many causal and structural effects depend on regressions. Examples include policy effects, average derivatives, regression decompositions, average treatment effects, causal mediation, and parameters of economic structural models. The regressions may be high dimensional, making machine learning useful. Plugging machine learners into identifying equations can lead to poor inference due to bias from regularization and/or model selection. This paper gives automatic debiasing for linear and nonlinear functions of regressions. The debiasing is automatic in using Lasso and the function of interest without the full form of the bias correction. The debiasing can be applied to any regression learner, including neural nets, random forests, Lasso, boosting, and other high dimensional methods. In addition to providing the bias correction we give standard errors that are robust to misspecification, convergence rates for the bias correction, and primitive conditions for asymptotic inference for estimators of a variety of estimators of structural and causal effects. The automatic debiased machine learning is used to estimate the average treatment effect on the treated for the NSW job training data and to estimate demand elasticities from Nielsen scanner data while allowing preferences to be correlated with prices and income. △ Less

Submitted 21 October, 2022; v1 submitted 13 September, 2018; originally announced September 2018.

Comments: Econometrica 2022

arXiv:1809.04951 [pdf, other]

Valid Simultaneous Inference in High-Dimensional Settings (with the hdm package for R)

Authors: Philipp Bach, Victor Chernozhukov, Martin Spindler

Abstract: Due to the increasing availability of high-dimensional empirical applications in many research disciplines, valid simultaneous inference becomes more and more important. For instance, high-dimensional settings might arise in economic studies due to very rich data sets with many potential covariates or in the analysis of treatment heterogeneities. Also the evaluation of potentially more complicated… ▽ More Due to the increasing availability of high-dimensional empirical applications in many research disciplines, valid simultaneous inference becomes more and more important. For instance, high-dimensional settings might arise in economic studies due to very rich data sets with many potential covariates or in the analysis of treatment heterogeneities. Also the evaluation of potentially more complicated (non-linear) functional forms of the regression relationship leads to many potential variables for which simultaneous inferential statements might be of interest. Here we provide a review of classical and modern methods for simultaneous inference in (high-dimensional) settings and illustrate their use by a case study using the R package hdm. The R package hdm implements valid joint powerful and efficient hypothesis tests for a potentially large number of coeffcients as well as the construction of simultaneous confidence intervals and, therefore, provides useful methods to perform valid post-selection inference based on the LASSO. △ Less

Submitted 13 September, 2018; originally announced September 2018.

Comments: 25 pages, 2 figures, 4 tables

arXiv:1809.01038 [pdf, other]

Shape-Enforcing Operators for Point and Interval Estimators

Authors: Xi Chen, Victor Chernozhukov, Iván Fernández-Val, Scott Kostyshak, Ye Luo

Abstract: A common problem in econometrics, statistics, and machine learning is to estimate and make inference on functions that satisfy shape restrictions. For example, distribution functions are nondecreasing and range between zero and one, height growth charts are nondecreasing in age, and production functions are nondecreasing and quasi-concave in input quantities. We propose a method to enforce these r… ▽ More A common problem in econometrics, statistics, and machine learning is to estimate and make inference on functions that satisfy shape restrictions. For example, distribution functions are nondecreasing and range between zero and one, height growth charts are nondecreasing in age, and production functions are nondecreasing and quasi-concave in input quantities. We propose a method to enforce these restrictions ex post on point and interval estimates of the target function by applying functional operators. If an operator satisfies certain properties that we make precise, the shape-enforced point estimates are closer to the target function than the original point estimates and the shape-enforced interval estimates have greater coverage and shorter length than the original interval estimates. We show that these properties hold for six different operators that cover commonly used shape restrictions in practice: range, convexity, monotonicity, monotone convexity, quasi-convexity, and monotone quasi-convexity. We illustrate the results with two empirical applications to the estimation of a height growth chart for infants in India and a production function for chemical firms in China. △ Less

Submitted 12 February, 2021; v1 submitted 4 September, 2018; originally announced September 2018.

Comments: 42 pages, 5 figures, 3 tables, v5 includes changes in the main text

MSC Class: 62F10; 62F25; 62G05; 62G15

arXiv:1808.10532 [pdf, other]

Uniform Inference in High-Dimensional Gaussian Graphical Models

Authors: Sven Klaassen, Jannis Kück, Martin Spindler, Victor Chernozhukov

Abstract: Graphical models have become a very popular tool for representing dependencies within a large set of variables and are key for representing causal structures. We provide results for uniform inference on high-dimensional graphical models with the number of target parameters $d$ being possible much larger than sample size. This is in particular important when certain features or structures of a caus… ▽ More Graphical models have become a very popular tool for representing dependencies within a large set of variables and are key for representing causal structures. We provide results for uniform inference on high-dimensional graphical models with the number of target parameters $d$ being possible much larger than sample size. This is in particular important when certain features or structures of a causal model should be recovered. Our results highlight how in high-dimensional settings graphical models can be estimated and recovered with modern machine learning methods in complex data sets. To construct simultaneous confidence regions on many target parameters, sufficiently fast estimation rates of the nuisance functions are crucial. In this context, we establish uniform estimation rates and sparsity guarantees of the square-root estimator in a random design under approximate sparsity conditions that might be of independent interest for related problems in high-dimensions. We also demonstrate in a comprehensive simulation study that our procedure has good small sample properties. △ Less

Submitted 3 December, 2018; v1 submitted 30 August, 2018; originally announced August 2018.

Comments: 59 pages, 2 figures, 6 tables

MSC Class: 62H15; 62J07;

arXiv:1806.11466 [pdf, ps, other]

Subvector Inference in Partially Identified Models with Many Moment Inequalities

Authors: Alexandre Belloni, Federico Bugni, Victor Chernozhukov

Abstract: This paper considers inference for a function of a parameter vector in a partially identified model with many moment inequalities. This framework allows the number of moment conditions to grow with the sample size, possibly at exponential rates. Our main motivating application is subvector inference, i.e., inference on a single component of the partially identified parameter vector associated with… ▽ More This paper considers inference for a function of a parameter vector in a partially identified model with many moment inequalities. This framework allows the number of moment conditions to grow with the sample size, possibly at exponential rates. Our main motivating application is subvector inference, i.e., inference on a single component of the partially identified parameter vector associated with a treatment effect or a policy variable of interest. Our inference method compares a MinMax test statistic (minimum over parameters satisfying $H_0$ and maximum over moment inequalities) against critical values that are based on bootstrap approximations or analytical bounds. We show that this method controls asymptotic size uniformly over a large class of data generating processes despite the partially identified many moment inequality setting. The finite sample analysis allows us to obtain explicit rates of convergence on the size control. Our results are based on combining non-asymptotic approximations and new high-dimensional central limit theorems for the MinMax of the components of random matrices. Unlike the previous literature on functional inference in partially identified models, our results do not rely on weak convergence results based on Donsker's class assumptions and, in fact, our test statistic may not even converge in distribution. Our bootstrap approximation requires the choice of a tuning parameter sequence that can avoid the excessive concentration of our test statistic. To this end, we propose an asymptotically valid data-driven method to select this tuning parameter sequence. This method generalizes the selection of tuning parameter sequences to problems outside the Donsker's class assumptions and may also be of independent interest. Our procedures based on self-normalized moderate deviation bounds are relatively more conservative but easier to implement. △ Less

Submitted 29 June, 2018; originally announced June 2018.

arXiv:1806.05081 [pdf, other]

LASSO-Driven Inference in Time and Space

Authors: Victor Chernozhukov, Wolfgang K. Härdle, Chen Huang, Weining Wang

Abstract: We consider the estimation and inference in a system of high-dimensional regression equations allowing for temporal and cross-sectional dependency in covariates and error processes, covering rather general forms of weak temporal dependence. A sequence of regressions with many regressors using LASSO (Least Absolute Shrinkage and Selection Operator) is applied for variable selection purpose, and an… ▽ More We consider the estimation and inference in a system of high-dimensional regression equations allowing for temporal and cross-sectional dependency in covariates and error processes, covering rather general forms of weak temporal dependence. A sequence of regressions with many regressors using LASSO (Least Absolute Shrinkage and Selection Operator) is applied for variable selection purpose, and an overall penalty level is carefully chosen by a block multiplier bootstrap procedure to account for multiplicity of the equations and dependencies in the data. Correspondingly, oracle properties with a jointly selected tuning parameter are derived. We further provide high-quality de-biased simultaneous inference on the many target parameters of the system. We provide bootstrap consistency results of the test procedure, which are based on a general Bahadur representation for the $Z$-estimators with dependent data. Simulations demonstrate good performance of the proposed inference procedure. Finally, we apply the method to quantify spillover effects of textual sentiment indices in a financial market and to test the connectedness among sectors. △ Less

Submitted 15 May, 2020; v1 submitted 13 June, 2018; originally announced June 2018.

arXiv:1806.01888 [pdf, other]

High-Dimensional Econometrics and Regularized GMM

Authors: Alexandre Belloni, Victor Chernozhukov, Denis Chetverikov, Christian Hansen, Kengo Kato

Abstract: This chapter presents key concepts and theoretical results for analyzing estimation and inference in high-dimensional models. High-dimensional models are characterized by having a number of unknown parameters that is not vanishingly small relative to the sample size. We first present results in a framework where estimators of parameters of interest may be represented directly as approximate means.… ▽ More This chapter presents key concepts and theoretical results for analyzing estimation and inference in high-dimensional models. High-dimensional models are characterized by having a number of unknown parameters that is not vanishingly small relative to the sample size. We first present results in a framework where estimators of parameters of interest may be represented directly as approximate means. Within this context, we review fundamental results including high-dimensional central limit theorems, bootstrap approximation of high-dimensional limit distributions, and moderate deviation theory. We also review key concepts underlying inference when many parameters are of interest such as multiple testing with family-wise error rate or false discovery rate control. We then turn to a general high-dimensional minimum distance framework with a special focus on generalized method of moments problems where we present results for estimation and inference about model parameters. The presented results cover a wide array of econometric applications, and we discuss several leading special cases including high-dimensional linear regression and linear instrumental variables models to illustrate the general results. △ Less

Submitted 10 June, 2018; v1 submitted 5 June, 2018; originally announced June 2018.

Comments: 104 pages, 4 figures

arXiv:1803.08154 [pdf, other]

Network and Panel Quantile Effects Via Distribution Regression

Authors: Victor Chernozhukov, Iván Fernández-Val, Martin Weidner

Abstract: This paper provides a method to construct simultaneous confidence bands for quantile functions and quantile effects in nonlinear network and panel models with unobserved two-way effects, strictly exogenous covariates, and possibly discrete outcome variables. The method is based upon projection of simultaneous confidence bands for distribution functions constructed from fixed effects distribution r… ▽ More This paper provides a method to construct simultaneous confidence bands for quantile functions and quantile effects in nonlinear network and panel models with unobserved two-way effects, strictly exogenous covariates, and possibly discrete outcome variables. The method is based upon projection of simultaneous confidence bands for distribution functions constructed from fixed effects distribution regression estimators. These fixed effects estimators are debiased to deal with the incidental parameter problem. Under asymptotic sequences where both dimensions of the data set grow at the same rate, the confidence bands for the quantile functions and effects have correct joint coverage in large samples. An empirical application to gravity models of trade illustrates the applicability of the methods to network data. △ Less

Submitted 8 June, 2020; v1 submitted 21 March, 2018; originally announced March 2018.

Comments: 71 pages, 8 figures, 3 tables, includes supplementary appendix

Showing 1–50 of 123 results for author: Chernozhukov, V