Skip to main content

Showing 1–47 of 47 results for author: Kuchibhotla, A K

  1. arXiv:2407.12278  [pdf, other

    math.ST

    Confidence Sets for $Z$-estimation Problems using Self-normalization

    Authors: Woonyoung Chang, Arun Kumar Kuchibhotla

    Abstract: Many commonly used statistical estimators are derived from optimization problems. This includes maximum likelihood estimation, empirical risk minimization, and so on. In many cases, the resulting estimators can be written as solutions to estimating equations, sometimes referred to as $Z$-estimators. Asymptotic normality for $Z$-estimators is a well-known result albeit when the dimension of the par… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  2. arXiv:2405.06437  [pdf, ps, other

    math.ST

    Generalized van Trees inequality: Local minimax bounds for non-smooth functionals and irregular statistical models

    Authors: Kenta Takatsu, Arun Kumar Kuchibhotla

    Abstract: In a decision-theoretic framework, the minimax lower bound provides the worst-case performance of estimators relative to a given class of statistical models. For parametric and semiparametric models, the Hájek--Le Cam local asymptotic minimax (LAM) theorem provides the sharp local asymptotic lower bound. Despite its relative generality, this result comes with limitations as it only applies to the… ▽ More

    Submitted 18 October, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

  3. arXiv:2403.06357  [pdf, other

    math.ST

    Inference for Median and a Generalization of HulC

    Authors: Manit Paul, Arun Kumar Kuchibhotla

    Abstract: Constructing distribution-free confidence intervals for the median, a classic problem in statistics, has seen numerous solutions in the literature. While coverage validity has received ample attention, less has been explored about interval width. Our study breaks new ground by investigating the width of these intervals under non-standard assumptions. Surprisingly, we find that properly scaled, the… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: 75 pages, 10 figures

  4. arXiv:2311.16598  [pdf, ps, other

    math.ST stat.ME

    Rectangular Hull Confidence Regions for Multivariate Parameters

    Authors: Aniket Jain, Arun K Kuchibhotla

    Abstract: We introduce three notions of multivariate median bias, namely, rectilinear, Tukey, and orthant median bias. Each of these median biases is zero under a suitable notion of multivariate symmetry. We study the coverage probabilities of rectangular hull of $B$ independent multivariate estimators, with special attention to the number of estimators $B$ needed to ensure a miscoverage of at most $α$. It… ▽ More

    Submitted 5 December, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: Added the proof of Proposition 3.9

  5. arXiv:2310.20058  [pdf, other

    math.ST stat.ME

    New Asymptotic Limit Theory and Inference for Monotone Regression

    Authors: Soham Mallick, Siddhaarth Sarkar, Arun Kumar Kuchibhotla

    Abstract: Nonparametric regression problems with qualitative constraints such as monotonicity or convexity are ubiquitous in applications. For example, in predicting the yield of a factory in terms of the number of labor hours, the monotonicity of the conditional mean function is a natural constraint. One can estimate a monotone conditional mean function using nonparametric least squares estimation, which i… ▽ More

    Submitted 17 November, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: Additional simulations added along with link to R code files

  6. arXiv:2310.07935  [pdf, other

    stat.ME stat.AP

    Estimating the Likelihood of Arrest from Police Records in Presence of Unreported Crimes

    Authors: Riccardo Fogliato, Arun Kumar Kuchibhotla, Zachary Lipton, Daniel Nagin, Alice Xiang, Alexandra Chouldechova

    Abstract: Many important policy decisions concerning policing hinge on our understanding of how likely various criminal offenses are to result in arrests. Since many crimes are never reported to law enforcement, estimates based on police records alone must be adjusted to account for the likelihood that each crime would have been reported to the police. In this paper, we present a methodological framework fo… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  7. arXiv:2307.16798  [pdf, other

    stat.ME math.ST stat.AP

    Forster-Warmuth Counterfactual Regression: A Unified Learning Approach

    Authors: Yachong Yang, Arun Kumar Kuchibhotla, Eric Tchetgen Tchetgen

    Abstract: Series or orthogonal basis regression is one of the most popular non-parametric regression techniques in practice, obtained by regressing the response on features generated by evaluating the basis functions at observed covariate values. The most routinely used series estimator is based on ordinary least squares fitting, which is known to be minimax rate optimal in various settings, albeit under st… ▽ More

    Submitted 20 March, 2024; v1 submitted 31 July, 2023; originally announced July 2023.

    Comments: Add grant acknowledgement

  8. arXiv:2307.05732  [pdf, ps, other

    stat.ME math.ST

    From isotonic to Lipschitz regression: a new interpolative perspective on shape-restricted estimation

    Authors: Kenta Takatsu, Tianyu Zhang, Arun Kumar Kuchibhotla

    Abstract: This manuscript seeks to bridge two seemingly disjoint paradigms of nonparametric regression: estimation based on smoothness assumptions and shape constraints. The proposed approach is motivated by a conceptually simple observation: Every Lipschitz function is a sum of monotonic and linear functions. This principle is further generalized to the higher-order monotonicity and multivariate covariates… ▽ More

    Submitted 13 October, 2024; v1 submitted 11 July, 2023; originally announced July 2023.

  9. arXiv:2307.00795  [pdf, other

    math.ST

    Inference for Projection Parameters in Linear Regression: beyond $d = o(n^{1/2})$

    Authors: Woonyoung Chang, Arun Kumar Kuchibhotla, Alessandro Rinaldo

    Abstract: We consider the problem of inference for projection parameters in linear regression with increasing dimensions. This problem has been studied under a variety of assumptions in the literature. The classical asymptotic normality result for the least squares estimator of the projection parameter only holds when the dimension $d$ of the covariates is of a smaller order than $n^{1/2}$, where $n$ is the… ▽ More

    Submitted 11 January, 2024; v1 submitted 3 July, 2023; originally announced July 2023.

    Comments: Updated Jan 11, 2024

  10. arXiv:2306.14382  [pdf, ps, other

    math.PR math.ST stat.AP

    Central Limit Theorems and Approximation Theory: Part II

    Authors: Arun Kumar Kuchibhotla

    Abstract: In Part I of this article (Banerjee and Kuchibhotla (2023)), we have introduced a new method to bound the difference in expectations of an average of independent random vector and the limiting Gaussian random vector using level sets. In the current article, we further explore this idea using finite sample Edgeworth expansions and also established integral representation theorems.

    Submitted 25 June, 2023; originally announced June 2023.

  11. arXiv:2306.14299  [pdf, ps, other

    math.PR math.ST

    Dual Induction CLT for High-dimensional m-dependent Data

    Authors: Heejong Bong, Arun Kumar Kuchibhotla, Alessandro Rinaldo

    Abstract: We derive novel and sharp high-dimensional Berry--Esseen bounds for the sum of $m$-dependent random vectors over the class of hyper-rectangles exhibiting only a poly-logarithmic dependence in the dimension. Our results hold under minimal assumptions, such as non-degenerate covariances and finite third moments, and yield a sample complexity of order $\sqrt{m/n}$, aside from logarithmic terms, match… ▽ More

    Submitted 16 November, 2023; v1 submitted 25 June, 2023; originally announced June 2023.

    Comments: 25 pages

    MSC Class: 60B12; 60F05

  12. arXiv:2306.05947  [pdf, ps, other

    math.ST stat.AP

    Central Limit Theorems and Approximation Theory: Part I

    Authors: Arisina Banerjee, Arun K Kuchibhotla

    Abstract: Central limit theorems (CLTs) have a long history in probability and statistics. They play a fundamental role in constructing valid statistical inference procedures. Over the last century, various techniques have been developed in probability and statistics to prove CLTs under a variety of assumptions on random variables. Quantitative versions of CLTs (e.g., Berry--Esseen bounds) have also been pa… ▽ More

    Submitted 25 June, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: 25 pages

  13. arXiv:2304.13016  [pdf, other

    math.ST cs.LG stat.ML

    Subsample Ridge Ensembles: Equivalences and Generalized Cross-Validation

    Authors: Jin-Hong Du, Pratik Patil, Arun Kumar Kuchibhotla

    Abstract: We study subsampling-based ridge ensembles in the proportional asymptotics regime, where the feature size grows proportionally with the sample size such that their ratio converges to a constant. By analyzing the squared prediction risk of ridge ensembles as a function of the explicit penalty $λ$ and the limiting subsample aspect ratio $φ_s$ (the ratio of the feature size to the subsample size), we… ▽ More

    Submitted 16 July, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

    Comments: 47 pages, 11 figures; this version fixes minor typos. arXiv admin note: text overlap with arXiv:2210.11445

  14. arXiv:2304.06158  [pdf, other

    stat.ME stat.ML

    Post-selection Inference for Conformal Prediction: Trading off Coverage for Precision

    Authors: Siddhaarth Sarkar, Arun Kumar Kuchibhotla

    Abstract: Conformal inference has played a pivotal role in providing uncertainty quantification for black-box ML prediction algorithms with finite sample guarantees. Traditionally, conformal prediction inference requires a data-independent specification of miscoverage level. In practical applications, one might want to update the miscoverage level after computing the prediction set. For example, in the cont… ▽ More

    Submitted 30 June, 2023; v1 submitted 12 April, 2023; originally announced April 2023.

  15. Extrapolated cross-validation for randomized ensembles

    Authors: Jin-Hong Du, Pratik Patil, Kathryn Roeder, Arun Kumar Kuchibhotla

    Abstract: Ensemble methods such as bagging and random forests are ubiquitous in various fields, from finance to genomics. Despite their prevalence, the question of the efficient tuning of ensemble parameters has received relatively little attention. This paper introduces a cross-validation method, ECV (Extrapolated Cross-Validation), for tuning the ensemble and subsample sizes in randomized ensembles. Our m… ▽ More

    Submitted 15 December, 2023; v1 submitted 26 February, 2023; originally announced February 2023.

    Comments: Accepted by the Journal of Computational and Graphical Statistics

  16. arXiv:2302.03850  [pdf, ps, other

    math.ST math.PR

    Tight Concentration Inequality for Sub-Weibull Random Variables with Generalized Bernstien Orlicz norm

    Authors: Heejong Bong, Arun Kumar Kuchibhotla

    Abstract: Recent development in high-dimensional statistical inference has necessitated concentration inequalities for a broader range of random variables. We focus on sub-Weibull random variables, which extend sub-Gaussian or sub-exponential random variables to allow heavy-tailed distributions. This paper presents concentration inequalities for independent sub-Weibull random variables with finite Generaliz… ▽ More

    Submitted 25 February, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

    MSC Class: 60G50; 60E15 (Primary) 60B20; 62E22 (Secondary)

  17. arXiv:2212.05355  [pdf, ps, other

    math.PR math.ST

    High-dimensional Berry-Esseen Bound for $m$-Dependent Random Samples

    Authors: Heejong Bong, Arun Kumar Kuchibhotla, Alessandro Rinaldo

    Abstract: In this work, we provide a $(n/m)^{-1/2}$-rate finite sample Berry-Esseen bound for $m$-dependent high-dimensional random vectors over the class of hyper-rectangles. This bound imposes minimal assumptions on the random vectors such as nondegenerate covariances and finite third moments. The proof uses inductive relationships between anti-concentration inequalities and Berry--Esseen bounds, which ar… ▽ More

    Submitted 10 December, 2022; originally announced December 2022.

  18. arXiv:2210.11445  [pdf, other

    math.ST stat.ML

    Bagging in overparameterized learning: Risk characterization and risk monotonization

    Authors: Pratik Patil, Jin-Hong Du, Arun Kumar Kuchibhotla

    Abstract: Bagging is a commonly used ensemble technique in statistics and machine learning to improve the performance of prediction procedures. In this paper, we study the prediction risk of variants of bagged predictors under the proportional asymptotics regime, in which the ratio of the number of features to the number of observations converges to a constant. Specifically, we propose a general strategy to… ▽ More

    Submitted 24 October, 2023; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: 102 pages, 34 figures; this version add minor clarifications at few places

  19. arXiv:2206.02954  [pdf, ps, other

    math.ST stat.ME

    Median Regularity and Honest Inference

    Authors: Arun Kumar Kuchibhotla, Sivaraman Balakrishnan, Larry Wasserman

    Abstract: We introduce a new notion of regularity of an estimator called median regularity. We prove that uniformly valid (honest) inference for a functional is possible if and only if there exists a median regular estimator of that functional. To our knowledge, such a notion of regularity that is necessary for uniformly valid inference is unavailable in the literature.

    Submitted 6 June, 2022; originally announced June 2022.

    Comments: 10 pages

  20. arXiv:2205.12937  [pdf, other

    math.ST cs.LG stat.ML

    Mitigating multiple descents: A model-agnostic framework for risk monotonization

    Authors: Pratik Patil, Arun Kumar Kuchibhotla, Yuting Wei, Alessandro Rinaldo

    Abstract: Recent empirical and theoretical analyses of several commonly used prediction procedures reveal a peculiar risk behavior in high dimensions, referred to as double/multiple descent, in which the asymptotic risk is a non-monotonic function of the limiting aspect ratio of the number of features or parameters to the sample size. To mitigate this undesirable behavior, we develop a general framework for… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

    Comments: 110 pages, 15 figures

  21. arXiv:2203.01761  [pdf, other

    stat.ME math.ST

    Doubly Robust Calibration of Prediction Sets under Covariate Shift

    Authors: Yachong Yang, Arun Kumar Kuchibhotla, Eric Tchetgen Tchetgen

    Abstract: Conformal prediction has received tremendous attention in recent years and has offered new solutions to problems in missing data and causal inference; yet these advances have not leveraged modern semiparametric efficiency theory for more robust and efficient uncertainty quantification. In this paper, we consider the problem of obtaining distribution-free prediction regions accounting for a shift i… ▽ More

    Submitted 13 December, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

    Comments: New contribution about impossibility of finite sample results and asymptotic conditional coverage through CQR score

  22. arXiv:2111.09211  [pdf, other

    stat.AP

    Improving Fairness in Criminal Justice Algorithmic Risk Assessments Using Optimal Transport and Conformal Prediction Sets

    Authors: Richard A. Berk, Arun Kumar Kuchibhotla, Eric Tchetgen Tchetgen

    Abstract: In the United States and elsewhere, risk assessment algorithms are being used to help inform criminal justice decision-makers. A common intent is to forecast an offender's ``future dangerousness.'' Such algorithms have been correctly criticized for potential unfairness, and there is an active cottage industry trying to make repairs. In this paper, we use counterfactual reasoning to consider the pr… ▽ More

    Submitted 9 August, 2022; v1 submitted 17 November, 2021; originally announced November 2021.

    Comments: 51 pages, 7 figures

  23. arXiv:2106.11188  [pdf, other

    stat.ME stat.CO

    maars: Tidy Inference under the 'Models as Approximations' Framework in R

    Authors: Riccardo Fogliato, Shamindra Shrotriya, Arun Kumar Kuchibhotla

    Abstract: Linear regression using ordinary least squares (OLS) is a critical part of every statistician's toolkit. In R, this is elegantly implemented via lm() and its related functions. However, the statistical inference output from this suite of functions is based on the assumption that the model is well specified. This assumption is often unrealistic and at best satisfied approximately. In the statistics… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

    Comments: The first two authors contributed equally to this work and are ordered alphabetically

  24. arXiv:2106.00164  [pdf, ps, other

    math.ST

    Median bias of M-estimators

    Authors: Arun Kumar Kuchibhotla

    Abstract: In this note, we derive bounds on the median bias of univariate M-estimators under mild regularity conditions. These requirements are not sufficient to imply convergence in distribution of the M-estimators. We also discuss median bias of some multivariate M-estimators.

    Submitted 31 May, 2021; originally announced June 2021.

  25. arXiv:2105.14577  [pdf, other

    math.ST stat.CO stat.ME

    The HulC: Confidence Regions from Convex Hulls

    Authors: Arun Kumar Kuchibhotla, Sivaraman Balakrishnan, Larry Wasserman

    Abstract: We develop and analyze the HulC, an intuitive and general method for constructing confidence sets using the convex hull of estimates constructed from subsets of the data. Unlike classical methods which are based on estimating the (limiting) distribution of an estimator, the HulC is often simpler to use and effectively bypasses this step. In comparison to the bootstrap, the HulC requires fewer regu… ▽ More

    Submitted 8 September, 2023; v1 submitted 30 May, 2021; originally announced May 2021.

    Comments: Latest version. Fixed a gap in Proposition and Theorem 1 pointed out by Prof. Hannes Leeb. Now all the simulations include a comparison with subsampling. Also, added several new simulation settings including quantile regression, isotonic regression both under non-standard assumptions

  26. arXiv:2104.13871  [pdf, other

    stat.ME

    Selection and Aggregation of Conformal Prediction Sets

    Authors: Yachong Yang, Arun Kumar Kuchibhotla

    Abstract: Conformal prediction is a generic methodology for finite-sample valid distribution-free prediction. This technique has garnered a lot of attention in the literature partly because it can be applied with any machine learning algorithm that provides point predictions to yield valid prediction regions. Of course, the efficiency (width/volume) of the resulting prediction region depends on the performa… ▽ More

    Submitted 11 April, 2024; v1 submitted 28 April, 2021; originally announced April 2021.

    Comments: 46 pages, 2 figures

  27. arXiv:2104.09358  [pdf, other

    stat.ME stat.AP

    Nested Conformal Prediction Sets for Classification with Applications to Probation Data

    Authors: Arun K. Kuchibhotla, Richard A. Berk

    Abstract: Risk assessments to help inform criminal justice decisions have been used in the United States since the 1920s. Over the past several years, statistical learning risk algorithms have been introduced amid much controversy about fairness, transparency and accuracy. In this paper, we focus on accuracy for a large department of probation and parole that is considering a major revision of its current,… ▽ More

    Submitted 13 April, 2021; originally announced April 2021.

  28. arXiv:2009.13673  [pdf, ps, other

    math.ST

    High-dimensional CLT for Sums of Non-degenerate Random Vectors: $n^{-1/2}$-rate

    Authors: Arun Kumar Kuchibhotla, Alessandro Rinaldo

    Abstract: In this note, we provide a Berry--Esseen bounds for rectangles in high-dimensions when the random vectors have non-singular covariance matrices. Under this assumption of non-singularity, we prove an $n^{-1/2}$ scaling for the Berry--Esseen bound for sums of mean independent random vectors with a finite third moment. The proof is essentially the method of compositions proof of multivariate Berry--E… ▽ More

    Submitted 28 September, 2020; originally announced September 2020.

    Comments: 21 pages

  29. arXiv:2008.11664   

    stat.AP stat.ML

    Improving Fairness in Criminal Justice Algorithmic Risk Assessments Using Conformal Prediction Sets

    Authors: Richard A. Berk, Arun Kumar Kuchibhotla

    Abstract: Risk assessment algorithms have been correctly criticized for potential unfairness, and there is an active cottage industry trying to make repairs. In this paper, we adopt a framework from conformal prediction sets to remove unfairness from risk algorithms themselves and the covariates used for forecasting. From a sample of 300,000 offenders at their arraignments, we construct a confusion table an… ▽ More

    Submitted 21 May, 2021; v1 submitted 26 August, 2020; originally announced August 2020.

    Comments: We found an interpretive error in the method. We are trying now to develop a better approach

  30. arXiv:2007.09751  [pdf, ps, other

    math.ST stat.ME

    Berry-Esseen Bounds for Projection Parameters and Partial Correlations with Increasing Dimension

    Authors: Arun Kumar Kuchibhotla, Alessandro Rinaldo, Larry Wasserman

    Abstract: We provide finite sample bounds on the Normal approximation to the law of the least squares estimator of the projection parameters normalized by the sandwich-based standard errors. Our results hold in the increasing dimension setting and under minimal assumptions on the data generating distribution. In particular, we do not assume a linear regression function and only require the existence of fini… ▽ More

    Submitted 22 October, 2021; v1 submitted 19 July, 2020; originally announced July 2020.

    Comments: 58 pages, 0 figures

  31. arXiv:2006.05022  [pdf, other

    math.ST cs.AI cs.LG stat.AP stat.ML

    Near-Optimal Confidence Sequences for Bounded Random Variables

    Authors: Arun Kumar Kuchibhotla, Qinqing Zheng

    Abstract: Many inference problems, such as sequential decision problems like A/B testing, adaptive sampling schemes like bandit selection, are often online in nature. The fundamental problem for online inference is to provide a sequence of confidence intervals that are valid uniformly over the growing-into-infinity sample sizes. To address this question, we provide a near-optimal confidence sequence for bou… ▽ More

    Submitted 3 June, 2021; v1 submitted 8 June, 2020; originally announced June 2020.

    Comments: Accepted to ICML 2021

  32. arXiv:2005.06095  [pdf, other

    stat.ME stat.AP stat.ML

    Exchangeability, Conformal Prediction, and Rank Tests

    Authors: Arun Kumar Kuchibhotla

    Abstract: Conformal prediction has been a very popular method of distribution-free predictive inference in recent years in machine learning and statistics. Its popularity stems from the fact that it works as a wrapper around any prediction algorithm such as neural networks or random forests. Exchangeability is at the core of the validity of conformal prediction. The concept of exchangeability is also at the… ▽ More

    Submitted 3 June, 2021; v1 submitted 12 May, 2020; originally announced May 2020.

    Comments: Added reference to Shi et al. (2020, JASA). Corrected Theorem 3

  33. arXiv:1910.10562  [pdf, other

    stat.ME cs.AI math.ST stat.ML

    Nested conformal prediction and quantile out-of-bag ensemble methods

    Authors: Chirag Gupta, Arun K. Kuchibhotla, Aaditya K. Ramdas

    Abstract: Conformal prediction is a popular tool for providing valid prediction sets for classification and regression problems, without relying on any distributional assumptions on the data. While the traditional description of conformal prediction starts with a nonconformity score, we provide an alternate (but equivalent) view that starts with a sequence of nested sets and calibrates them to find a valid… ▽ More

    Submitted 9 May, 2022; v1 submitted 23 October, 2019; originally announced October 2019.

    Comments: 38 pages, 4 figures, 8 tables. This version fixes a bug in the proof of Proposition 3. Published paper available at https://www.sciencedirect.com/science/article/abs/pii/S0031320321006725

    Journal ref: Pattern Recognition 127 (2022): 108496

  34. arXiv:1910.06386  [pdf, other

    math.ST stat.ME

    All of Linear Regression

    Authors: Arun K. Kuchibhotla, Lawrence D. Brown, Andreas Buja, Junhui Cai

    Abstract: Least squares linear regression is one of the oldest and widely used data analysis tools. Although the theoretical analysis of the ordinary least squares (OLS) estimator is as old, several fundamental questions are yet to be answered. Suppose regression observations $(X_1,Y_1),\ldots,(X_n,Y_n)\in\mathbb{R}^d\times\mathbb{R}$ (not necessarily independent) are available. Some of the questions we dea… ▽ More

    Submitted 14 October, 2019; originally announced October 2019.

  35. arXiv:1910.05480  [pdf, ps, other

    math.ST stat.ML

    First order expansion of convex regularized estimators

    Authors: Pierre C Bellec, Arun K Kuchibhotla

    Abstract: We consider first order expansions of convex penalized estimators in high-dimensional regression problems with random designs. Our setting includes linear regression and logistic regression as special cases. For a given penalty function $h$ and the corresponding penalized estimator $\hatβ$, we construct a quantity $η$, the first order expansion of $\hatβ$, such that the distance between $\hatβ$ an… ▽ More

    Submitted 8 March, 2020; v1 submitted 11 October, 2019; originally announced October 2019.

    Comments: Accepted at NeurIPS 2019 and published at https://papers.nips.cc/paper/8606-first-order-expansion-of-convex-regularized-estimators . The version here includes the supplementary material

  36. arXiv:1909.02088  [pdf, other

    math.ST cs.LG stat.ML

    On Least Squares Estimation under Heteroscedastic and Heavy-Tailed Errors

    Authors: Arun K. Kuchibhotla, Rohit K. Patra

    Abstract: We consider least squares estimation in a general nonparametric regression model. The rate of convergence of the least squares estimator (LSE) for the unknown regression function is well studied when the errors are sub-Gaussian. We find upper bounds on the rates of convergence of the LSE when the errors have uniformly bounded conditional variance and have only finitely many moments. We show that t… ▽ More

    Submitted 8 April, 2021; v1 submitted 4 September, 2019; originally announced September 2019.

    Comments: 49 pages, 2 figures, and 3 tables

  37. arXiv:1809.10538  [pdf, ps, other

    math.ST

    Model-free Study of Ordinary Least Squares Linear Regression

    Authors: Arun K. Kuchibhotla, Lawrence D. Brown, Andreas Buja

    Abstract: Ordinary least squares (OLS) linear regression is one of the most basic statistical techniques for data analysis. In the main stream literature and the statistical education, the study of linear regression is typically restricted to the case where the covariates are fixed, errors are mean zero Gaussians with variance independent of the (fixed) covariates. Even though OLS has been studied under mis… ▽ More

    Submitted 27 September, 2018; originally announced September 2018.

    Comments: 33 pages

  38. arXiv:1809.05172  [pdf, ps, other

    math.ST stat.ML

    Deterministic Inequalities for Smooth M-estimators

    Authors: Arun Kumar Kuchibhotla

    Abstract: Ever since the proof of asymptotic normality of maximum likelihood estimator by Cramer (1946), it has been understood that a basic technique of the Taylor series expansion suffices for asymptotics of $M$-estimators with smooth/differentiable loss function. Although the Taylor series expansion is a purely deterministic tool, the realization that the asymptotic normality results can also be made det… ▽ More

    Submitted 13 September, 2018; originally announced September 2018.

    Comments: 49 pages

  39. arXiv:1806.09014  [pdf, other

    stat.ME

    Assumption Lean Regression

    Authors: Richard Berk, Andreas Buja, Lawrence Brown, Edward George, Arun Kumar Kuchibhotla, Weijie J. Su, Linda Zhao

    Abstract: It is well known that models used in conventional regression analysis are commonly misspecified. A standard response is little more than a shrug. Data analysts invoke Box's maxim that all models are wrong and then proceed as if the results are useful nevertheless. In this paper, we provide an alternative. Regression models are treated explicitly as approximations of a true response surface that ca… ▽ More

    Submitted 26 June, 2018; v1 submitted 23 June, 2018; originally announced June 2018.

    Comments: Submitted for review, 21 pages, 2 figures

  40. arXiv:1806.06153  [pdf, ps, other

    math.ST

    High-dimensional CLT: Improvements, Non-uniform Extensions and Large Deviations

    Authors: Arun Kumar Kuchibhotla, Somabha Mukherjee, Debapratim Banerjee

    Abstract: Central limit theorems (CLTs) for high-dimensional random vectors with dimension possibly growing with the sample size have received a lot of attention in the recent times. Chernozhukov et al. (2017) proved a Berry--Esseen type result for high-dimensional averages for the class of hyperrectangles and they proved that the rate of convergence can be upper bounded by $n^{-1/6}$ upto a polynomial fact… ▽ More

    Submitted 24 June, 2019; v1 submitted 15 June, 2018; originally announced June 2018.

    Comments: 76 pages

  41. arXiv:1806.04119  [pdf, ps, other

    stat.ME math.ST

    Valid Post-selection Inference in Assumption-lean Linear Regression

    Authors: Arun Kumar Kuchibhotla, Lawrence D. Brown, Andreas Buja, Edward I. George, Linda Zhao

    Abstract: Construction of valid statistical inference for estimators based on data-driven selection has received a lot of attention in the recent times. Berk et al. (2013) is possibly the first work to provide valid inference for Gaussian homoscedastic linear regression with fixed covariates under arbitrary covariate/variable selection. The setting is unrealistic and is extended by Bachoc et al. (2016) by r… ▽ More

    Submitted 11 June, 2018; originally announced June 2018.

    Comments: 49 pages

  42. arXiv:1804.02605  [pdf, other

    math.ST stat.ME stat.ML

    Moving Beyond Sub-Gaussianity in High-Dimensional Statistics: Applications in Covariance Estimation and Linear Regression

    Authors: Arun Kumar Kuchibhotla, Abhishek Chakrabortty

    Abstract: Concentration inequalities form an essential toolkit in the study of high dimensional (HD) statistical methods. Most of the relevant statistics literature in this regard is based on sub-Gaussian or sub-exponential tail assumptions. In this paper, we first bring together various probabilistic inequalities for sums of independent random variables under much more general exponential type (namely sub-… ▽ More

    Submitted 9 May, 2022; v1 submitted 7 April, 2018; originally announced April 2018.

    Comments: 68 pages; Revised version; To appear in Information and Inference: A Journal of the IMA

    MSC Class: 60G50; 62J05; 60B20; 62J07; 62E17; 60F05; 60E15

    Journal ref: Information and Inference: A Journal of the IMA (2022), Vol. 11, No. 4, 1389-1456

  43. arXiv:1802.05801  [pdf, ps, other

    math.ST

    Uniform-in-Submodel Bounds for Linear Regression in a Model Free Framework

    Authors: Arun Kumar Kuchibhotla, Lawrence D. Brown, Andreas Buja, Edward I. George, Linda Zhao

    Abstract: For the last two decades, high-dimensional data and methods have proliferated throughout the literature. Yet, the classical technique of linear regression has not lost its usefulness in applications. In fact, many high-dimensional estimation techniques can be seen as variable selection that leads to a smaller set of variables (a ``sub-model'') where classical linear regression applies. We analyze… ▽ More

    Submitted 17 May, 2021; v1 submitted 15 February, 2018; originally announced February 2018.

    Comments: Forthcoming at Econometric Theory

  44. arXiv:1708.00145  [pdf, other

    math.ST stat.CO stat.ME

    Semiparametric Efficiency in Convexity Constrained Single Index Model

    Authors: Arun K. Kuchibhotla, Rohit K. Patra, Bodhisattva Sen

    Abstract: We consider estimation and inference in a single index regression model with an unknown convex link function. We introduce a convex and Lipschitz constrained least squares estimator (CLSE) for both the parametric and the nonparametric components given independent and identically distributed observations. We prove the consistency and find the rates of convergence of the CLSE when the errors are ass… ▽ More

    Submitted 13 January, 2021; v1 submitted 31 July, 2017; originally announced August 2017.

    Comments: Removed the density bounded away from zero assumption in assumption (A5). Weakened assumption (B2)

  45. arXiv:1706.05745  [pdf, other

    stat.ME

    Statistical Inference based on Bridge Divergences

    Authors: Arun Kumar Kuchibhotla, Somabha Mukherjee, Ayanendranath Basu

    Abstract: M-estimators offer simple robust alternatives to the maximum likelihood estimator. Much of the robustness literature, however, has focused on the problems of location, location-scale and regression estimation rather than on estimation of general parameters. The density power divergence (DPD) and the logarithmic density power divergence (LDPD) measures provide two classes of competitive M-estimator… ▽ More

    Submitted 18 June, 2017; originally announced June 2017.

    Comments: 45 pages

  46. arXiv:1612.03257  [pdf, other

    math.ST

    Models as Approximations II: A Model-Free Theory of Parametric Regression

    Authors: Andreas Buja, Lawrence Brown, Arun Kumar Kuchibhotla, Richard Berk, Ed George, Linda Zhao

    Abstract: We develop a model-free theory of general types of parametric regression for iid observations. The theory replaces the parameters of parametric models with statistical functionals, to be called "regression functionals'', defined on large non-parametric classes of joint $\xy$ distributions, without assuming a correct model. Parametric models are reduced to heuristics to suggest plausible objective… ▽ More

    Submitted 6 July, 2019; v1 submitted 10 December, 2016; originally announced December 2016.

    Comments: Submitted

    MSC Class: 62A01

  47. arXiv:1612.00068  [pdf, other

    stat.ME

    Efficient Estimation in Single Index Models through Smoothing splines

    Authors: Arun Kumar Kuchibhotla, Rohit Kumar Patra

    Abstract: We consider estimation and inference in a single index regression model with an unknown but smooth link function. In contrast to the standard approach of using kernels or regression splines, we use smoothing splines to estimate the smooth link function. We develop a method to compute the penalized least squares estimators (PLSEs) of the parametric and the nonparametric components given independent… ▽ More

    Submitted 25 May, 2019; v1 submitted 30 November, 2016; originally announced December 2016.

    Comments: 50 pages, 3 figures, and 2 tables