Abstract
Hermite series based distribution function estimators have recently been applied in the context of sequential quantile estimation. These distribution function estimators are particularly useful because they allow the online (sequential) estimation of the full cumulative distribution function. This is in contrast to the empirical distribution function estimator and smooth kernel distribution function estimator which only allow sequential cumulative probability estimation at particular values on the support of the associated density function. Hermite series based distribution function estimators are well suited to the settings of streaming data, one-pass analysis of massive data sets and decentralised estimation. In this article we study these estimators in a more general context, thereby redressing a gap in the literature. In particular, we derive new asymptotic consistency results in the mean squared error, mean integrated squared error and almost sure sense. We also present novel Bias-robustness results for these estimators. Finally, we study the finite sample performance of the Hermite series based estimators through a real data example and simulation study. Our results indicate that in the general (non-sequential) context, the Hermite series based distribution function estimators are inferior to smooth kernel distribution function estimators, but may remain compelling in the context of sequential estimation of the full distribution function.
Similar content being viewed by others
Notes
Download: https://mikejareds.github.io/FXData/
References
Altman N, Leger C (1995) Bandwidth selection for kernel distribution function estimation. J Stat Plan Inference 46(2):195–214
Arfken GB, Weber HJ (2001) Mathematical methods for physicists, 5th edn. Harcourt/Academic Press, Burlington
Babu GJ, Canty AJ, Chaubey YP (2002) Application of Bernstein polynomials for smooth estimation of a distribution and density function. J Statist Plann Inference 105(2):377–392. https://doi.org/10.1016/S0378-3758(01)00265-8
Ciesielski Z, Zieliński R (2009) Polynomial and spline estimators of the distribution function with prescribed accuracy. Appl Math (Warsaw) 36(1):1–12. https://doi.org/10.4064/am36-1-1
Dharmadhikari S, Jogdeo K et al (1969) Bounds on moments of certain random variables. Ann Math Stat 40(4):1506–1509
DLMF (2017) NIST Digital Library of Mathematical Functions. In: Olver fWJ, Olde Daalhuis AB, Lozier DW, Schneider BI, Boisvert RF, Clark CW, Miller BR, Saunders BV (eds). http://dlmf.nist.gov/, Release 1.0.17 of 2017-12-22, http://dlmf.nist.gov/
Greblicki W, Pawlak M (1984) Hermite series estimates of a probability density and its derivatives. J Multivar Anal 15(2):174–182. https://doi.org/10.1016/0047-259X(84)90024-1
Greblicki W, Pawlak M (1985) Pointwise consistency of the Hermite series density estimate. Statist Probab Lett 3(2):65–69. https://doi.org/10.1016/0167-7152(85)90026-4
Hall P (1983) Orthogonal series distribution function estimation, with applications. J R Stat Soc Ser B (Methodol) 45(1):81–88
Hampel FR (1974) The influence curve and its role in robust estimation. J Am Stat Assoc 69(346):383–393
Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA (2011) Robust statistics: the approach based on influence functions, vol 196. Wiley, London
Jmaei A, Slaoui Y, Dellagi W (2017) Recursive distribution estimator defined by stochastic approximation method using Bernstein polynomials. J Nonparametr Stat 29(4):792–805. https://doi.org/10.1080/10485252.2017.1369538
Kendall MG, Stuart A, Ord JK (1987) Kendall’s advanced theory of statistics. v. 1: Distribution theory. Griffin, London
Leblanc A (2012) On estimating distribution functions using Bernstein polynomials. Ann Inst Statist Math 64(5):919–943. https://doi.org/10.1007/s10463-011-0339-4
Liebscher E (1990) Hermite series estimators for probability densities. Metrika 37(6):321–343. https://doi.org/10.1007/BF02613540
Marron JS, Wand MP (1992) Exact mean integrated squared error. Ann Stat 712–736
Nadaraya EA (1964) Some new estimates for distribution functions. Theory Probab Its Appl 9(3):497–500
Pawlak M, Stadtmüller U (2018) On certain integral operators associated with hermite and laguerre polynomials. Applicationes Mathematicae 1–20
Pepelyshev A, Rafajł owicz E, Steland A (2014) Estimation of the quantile function using Bernstein–Durrmeyer polynomials. J Nonparametr Stat 26(1):1–20. https://doi.org/10.1080/10485252.2013.826355
Reiss RD (1981) Nonparametric estimation of smooth distribution functions. Scand J Statist 8(2):116–119
Schwartz SC (1967) Estimation of probability density by an orthogonal series. Ann Math Statist 38:1261–1265. https://doi.org/10.1214/aoms/1177698795
Slaoui Y (2014) The stochastic approximation method for estimation of a distribution function. Math Methods Stat 23(4):306–325
Stephanou M, Varughese M, Macdonald I (2017) Sequential quantiles via Hermite series density estimation. Electron J Stat 11(1):570–607. https://doi.org/10.1214/17-EJS1245
Szego G (1975) Orthogonal polynomials, 4th edn. American Mathematical Society, Providence, R.I., American Mathematical Society, Colloquium Publications, Vol. XXIII
van der Vaart AW (1998) Asymptotic statistics, Cambridge series in statistical and probabilistic mathematics, vol 3. Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9780511802256
Walter GG (1977) Properties of Hermite series estimation of probability density. Ann Statist 5(6):1258–1264
Watson GS, Leadbetter MR (1964) Hazard analysis. II. Sankhyā Ser A 26:101–116
Acknowledgements
We would like to sincerely thank the reviewers for their insightful and useful comments which helped us materially improve this article.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
A Lemmas
The first lemma is due to Greblicki and Pawlak (1985) (Lemma 1) which is restated without proof as Lemma 1 below:
Lemma 1
(Greblicki and Pawlak (1985))
at every differentiability point of f. If \(f \in L_{p}\), \(p>1\), the convergence holds for almost all \(x \in \mathbb {R}\).
The second Lemma is due to Liebscher (1990) (Lemma 5 in that paper) which is presented without proof as Lemma 2 below:
Lemma 2
(Liebscher (1990)) For the Hermite series estimators (3):
The third Lemma is due to Greblicki and Pawlak (1984) (following from equation (15) in Theorem 4 of Greblicki and Pawlak (1984)), which we restate without proof as Lemma 3 below:
Lemma 3
(Greblicki and Pawlak 1984) For the Hermite series estimators (3), if \(E|X|^{s} < \infty \), \(s> 8(r+1)/3(2r+1)\) then:
Finally, we present an important novel result with proof in Lemma 4 below. We will make use of Lemma 4 several times in this article.
Lemma 4
where \(c_{1}\) and \(d_{1}\) are positive constants.
Proof
This follows from the inequalities implied by Theorem 8.91.3 of Szego (1975), namely, \(\max _{|x|\le a} h_{k}(x) \le c_{a} (k+1)^{-\frac{1}{4}}\) and \(\max _{|x|\ge a}|h_{k}(x)|x^{\lambda } \le d_{a} (k+1)^{s} \) where \(c_{a}\) and \(d_{a}\) are positive constants depending only on a, \(s=\max (\frac{\lambda }{2} - \frac{1}{12}, -\frac{1}{4})\) and we have set \(\lambda =1+b, \, b>0\). In addition, we have made use of \(\int _{1}^{\infty } x^{-1-b} = \frac{1}{b}, \, b>0\). For concreteness we have set \(b=\frac{1}{6}\). \(\square \)
B Proofs of propositions and theorems
1.1 B.1 Proof of Proposition 1
Proof
This follows from (3), (4), the fact that \(E(\hat{a}_k)=a_k\) and Lemma 1. By the monotone convergence theorem we have,
Utilising Lemma 4 we have:
where we have also used the fact that by assumption, \((x-\frac{d}{dx})^r f(x) \in L_{2}\) and Walter (1977) has shown \(a_{k}^{2} \le \frac{b_{k+r}^{2}}{(k+1)^r}\), where \(b_k\) is the k-th coefficient of the expansion of \((x-\frac{d}{dx})^r f(x) \in L_{2}\). In addition, we have utilised Parseval’s theorem, \(||(x-\frac{d}{dx})^r f(x)||^{2} = \sum _{k=0}^{\infty } b_{k}^{2}\), and the Cauchy–Schwarz inequality in the last line. Using the well known properties of the Hurwitz Zeta function, \(\zeta (s,a) = \sum _{k=0}^{\infty }(k+a)^{-s}\), (see DLMF 2017, 25.11.43), we have:
completing the proof. \(\square \)
1.2 B.2 Proof of Proposition 2
Proof
It is easy to see that
Now by virtue of Lemma 4 we have:
Making use of Lemma 2 we have,
\(\square \)
1.3 B.3 Proof of Theorem 3
Proof
We begin by restating the definition of the rate of almost sure convergence provided in Greblicki and Pawlak (1984): for a sequence of random variables \(Y_{n}\), we say that \(Y=O(a_{n})\) almost surely if \(\frac{\beta {n} Y_{n}}{a_n} \rightarrow 0\) almost surely as \(n \rightarrow \infty \), for all (non-negative) sequences \(\{\beta _{n}\}\) convergent to zero. Now,
By Proposition 1,
In addition, via (8) we have:
We make use of Lemma 3 to obtain,
and finally:
\(\square \)
1.4 B.4 Proof of Theorem 4
Proof
It suffices to prove \(\sum _{n=1}^{\infty } P\left( |\hat{F}_{N(n)} (x)- F(x)| > \epsilon \right) < \infty \) for all \(\epsilon > 0\) (Borel-Cantelli). We have via the law of total probability,
where c is a constant. By the assumption that \(\sum _{n=1}^{\infty } P\left( \frac{N(n)}{n^{\gamma }} > \epsilon \right) < \infty \) for all \(\epsilon > 0\) , it is clear that the first term is finite. It remains to show that \(\sum _{n=1}^{\infty } P\left( |\hat{F}_{N(n)} (x)- F(x)| > \epsilon \big | N(n) \le cn^{\gamma } \right) < \infty \) for all \(\epsilon > 0\). By the conditional Markov inequality we have:
for all \(\epsilon > 0\). Using the fact that \(|f+g|^{p} \le 2^{p-1} (|f|^{p} +|g|^{p})\) along with the Hölder inequality, Lemma 4 and Proposition 1 we have,
for all \(\epsilon > 0\) , where \(b_{1}, b_{2}\) are positive constants. Now, the results of Dharmadhikari and Jogdeo (1969) for independent random variables, \(X_{i}\), with zero mean imply that (Theorem 2 in that paper):
where \(\nu \ge 2\) and \(F_{\nu }\) is a constant depending only on \(\nu \). Thus we have \(E|\hat{a_k}-a_k |^{p} = n^{-p} E|\sum _{i=1}^{n} (h_k(\mathbf {x_i}) - a_k)|^{p} \le F_{p} n^{-p/2-1}\sum _{i=1}^{n} E|h_k(\mathbf {x_i}) - a_k|^{p} \), where \(F_{p}\) is a constant depending only on p. Also noting that \(\max _{x} |h_{k}(x)| \le C (k+1)^{-1/12}\) where C is a positive constant (implied by Theorem 8.91.3 of Szego 1975), we have:
for all \(\epsilon > 0\), where \(b_{3}\) depends only on p. It is easy to see that for \(r>2\) and \(q(n) = O(n^{\gamma })\), \(0< \gamma < 6/17\), we can choose p such that \(\sum _{n=1}^{\infty } P\left( |\hat{F}_{N(n)} (x)- F(x)| > \epsilon | N(n) = q(n) \right) < \infty \) for all \(\epsilon > 0\). \(\square \)
1.5 B.5 Proof of Proposition 3
Proof
The fixed N hermite series estimator (4) (equal to (5)) can be represented as:
where \(d_{N}(t,y) = \sum _{k=0}^{N} h_{k}(t) h_{k}(y)\). The influence function and empirical influence function are:
Now, for fixed N,
where \(u_1\) and \(v_1\) are constants. The result (10) follows from Lemma 4 and the fact that \(\max _{x} |h_{k}(t)| \le C (k+1)^{-1/12}\). Thus, the gross-error sensitivities, \(\sup _{x'} |IF(x,x';T,F)|< \infty \) and \(\sup _{x'} |IF(x,x';T,\hat{F}_{n})| < \infty \) and the fixed N Hermite series cumulative distribution function estimator is Bias-robust. \(\square \)
1.6 B.6 Proof of Proposition 4
Proof
The Kernel distribution function estimator is defined as:
This has the representation:
where \(\hat{F}_{n}\) is the empirical distribution function. The influence function and empirical influence function are easily seen to be:
Since \(\int _{-\infty }^{\infty } K(u) du = 1\), it is clear that \(\sup _{x'} |IF(x,x';T,F)| \le 2 < \infty \), \(\sup _{x'}|IF(x,x';T,\hat{F}_{n})| \le 2 < \infty \), and thus the smooth kernel distribution function estimator is Bias-robust. \(\square \)
1.7 B.7 Proof of Proposition 5
Proof
Suppose a density function, f(x), can be expanded formally as:
where \(\phi (x)=\frac{e^{-x^2/2}}{\sqrt{2\pi }}\) and \(He_{k}(x)\) are the Chebyshev-Hermite polynomials (following the notation of Szego (1975)). The truncated expansion has the form:
usually truncated to obtain:
where \(\mu _{2},\mu _{3},\mu _{4}\) are non-central moments. This is the Gram–Charlier series of Type A Kendall et al. (1987). A natural cumulative distribution function estimator based on the Gram–Charlier series is:
This has the representation
Now:
Since \(He_{k}(x')\) is not bounded, whereas the second terms of \(IF(x,x';T,F)\) and \(IF(x,x';T,\hat{F}_{n})\) are bounded, the gross-error sensitivities, \(\sup _{x'} |IF(x,x';T,F)|\) and \(\sup _{x'} |IF(x,x';T,\hat{F}_{n})|\) are not bounded and thus the CDF estimator (11) is not Bias-robust. \(\square \)
Rights and permissions
About this article
Cite this article
Stephanou, M., Varughese, M. On the properties of hermite series based distribution function estimators. Metrika 84, 535–559 (2021). https://doi.org/10.1007/s00184-020-00785-z
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00184-020-00785-z
Keywords
- Hermite series distribution function estimator
- Sequential distribution function estimators
- Mean squared error
- Almost sure convergence
- Robustness
- Influence functions