×

Kernel density estimation based on the distinct units in sampling with replacement. (English) Zbl 1476.62070

Summary: This paper considers the problem of estimating density functions using the kernel method based on the set of distinct units in sampling with replacement. Using a combined design-model-based inference framework, which accounts for the underlying superpopulation model as well as the randomization distribution induced by the sampling design, we derive asymptotic expressions for the bias and integrated mean squared error (MISE) of a Parzen-Rosenblatt-type kernel density estimator (KDE) based on the distinct units from sampling with replacement. We also prove the asymptotic normality of the distinct units KDE under both design-based and combined inference frameworks. Additionally, we give the asymptotic MISE formulas of several alternative estimators including the estimator based on the full with-replacement sample and estimators based on without-replacement sampling of similar cost. Using the MISE expressions, we discuss how the various estimators compare asymptotically. Moreover, we use Monte Carlo simulations to investigate the finite sample properties of these estimators. Our simulation results show that the distinct units KDE and the without-replacement KDEs perform similarly but are all always superior to the full with-replacement sample KDE. Furthermore, we briefly discuss a Nadaraya-Watson-type kernel regression estimator based on the distinct units from sampling with replacement, derive its MSE under the combined inference framework, and demonstrate its finite sample properties using a small simulation study. Finally, we extend the distinct units density and regression estimators to the case of two-stage sampling with replacement.

MSC:

62G07 Density estimation
62D05 Sampling theory, sample surveys
62G08 Nonparametric regression and quantile regression
Full Text: DOI

References:

[1] Alin, A.; Martin, MA; Beyaztas, U.; Pathak, PK, Sufficient m-out-of-n (m/n) bootstrap, J. Stat. Comput. Simul., 87, 1742-1753 (2017) · Zbl 07192027 · doi:10.1080/00949655.2017.1284847
[2] Antal, E.; Tillé, Y., Direct bootstrap method for complex sampling designs from a finite population, J. Am. Stat. Assoc., 106, 534-543 (2011) · Zbl 1232.62030 · doi:10.1198/jasa.2011.tm09767
[3] Antal, E.; Tillé, Y., Simple random sampling with over-replacement, J. Stat. Plan. Inference, 141, 597-601 (2011) · Zbl 1197.62007 · doi:10.1016/j.jspi.2010.06.029
[4] Arnab, R., On use of distinct respondents in randomized response surveys, Biom. J., 41, 507-513 (1999) · Zbl 0932.62009 · doi:10.1002/(SICI)1521-4036(199907)41:4<507::AID-BIMJ507>3.0.CO;2-E
[5] Basu, D., On sampling with and without replacement, Sankhyā, 20, 287-294 (1958) · Zbl 0088.12602
[6] Bellhouse, DR; Stafford, JE, Density estimation from complex surveys, Stat. Sin., 9, 407-424 (1999) · Zbl 0921.62041
[7] Bleuer, SR; Kratina, IS, On the two-phase framework for joint model and design-based inference, Ann. Stat., 33, 2789-2810 (2005) · Zbl 1084.62020
[8] Bonnéry, D.; Breidt, FJ; Coquet, F., Kernel estimation for a superpopulation probability density function under informative selection, Metron, 75, 301-318 (2017) · Zbl 1392.62099 · doi:10.1007/s40300-017-0127-x
[9] Buskirk, TD; Lohr, SL, Asymptotic properties of kernel density estimation with complex survey data, J. Stat. Plan. Inference, 128, 165-190 (2005) · Zbl 1058.62032 · doi:10.1016/j.jspi.2003.09.036
[10] Cochran, WG, Sampling Techniques (1977), New York: Wiley, New York · Zbl 0353.62011
[11] Efron, B., Bootstrap methods: Another look at the jackknife, Ann. Stat., 7, 1-26 (1979) · Zbl 0406.62024 · doi:10.1214/aos/1176344552
[12] Efron, B.; Tibshirani, R., An Introduction to Bootstrap (1993), New York: Chapman and Hall, New York · Zbl 0835.62038 · doi:10.1007/978-1-4899-4541-9
[13] Fan, J.; Gijbels, I., Local Polynomial Modelling and Its Applications (1996), New York: Chapman & Hall, New York · Zbl 0873.62037
[14] Guillera-Arroita, G., Impact of sampling with replacement in occupancy studies with spatial replication, Methods Ecol. Evol., 2, 401-406 (2011) · doi:10.1111/j.2041-210X.2011.00089.x
[15] Harms, T.; Duchesne, P., On kernel nonparametric regression designed for complex survey data, Metrika, 72, 111-138 (2010) · Zbl 1189.62069 · doi:10.1007/s00184-009-0244-5
[16] Hartley, HO; Sielken, RL, A “superpopulation viewpoint” for finite population sampling, Biometrics, 31, 411-422 (1975) · Zbl 0334.62005 · doi:10.2307/2529429
[17] Isaki, CT; Fuller, WA, Survey design under the regression superpopulation model, J. Am. Stat. Assoc., 77, 89-96 (1982) · Zbl 0511.62016 · doi:10.1080/01621459.1982.10477770
[18] Korwar, RM; Serfling, RJ, On averaging over distinct units in sampling with replacement, Ann. Math. Stat., 41, 2132-2134 (1970) · Zbl 0226.62003 · doi:10.1214/aoms/1177696713
[19] Lanke, J., Some contributions to the theory of survey sampling. PhD thesis Department of Mathematical Statistics (1975), Sweden: University of Lund, Sweden · Zbl 0382.62007
[20] Lohr, SL, Sampling: Design and Analysis (2010), Massachusetts: Cengage Learning, Massachusetts · Zbl 1273.62010
[21] Mostafa, SA; Ahmad, IA, Kernel density estimation from complex surveys in the presence of complete auxiliary information, Metrika, 82, 295-338 (2019) · Zbl 1425.62019 · doi:10.1007/s00184-018-0703-y
[22] Nadaraya, EA, On estimating regression, Theory Probab. Appl., 9, 141-142 (1964) · Zbl 0136.40902 · doi:10.1137/1109020
[23] Naiman, D.Q. and Torcaso, F. (2016). To replace or not to replace in finite population sampling. arXiv:1606.01782.
[24] Park, BH; Ostrouchov, G.; Samatova, NF, Sampling streaming data with replacement, Comput. Stat. Data Anal., 52, 750-762 (2007) · Zbl 1452.62107 · doi:10.1016/j.csda.2007.03.010
[25] Parzen, E., On estimation of a probability density function and mode, Ann. Math. Stat., 33, 1065-1076 (1962) · Zbl 0116.11302 · doi:10.1214/aoms/1177704472
[26] Pathak, PK, On the evaluation of moments of distinct units in a sample, Sankhyā Ser A, 23, 415-420 (1961) · Zbl 0101.12105
[27] Pathak, PK, On sampling with unequal probabilities, Sankhyā Ser. A, 24, 315-326 (1962) · Zbl 0109.12102
[28] Pathak, PK, On simple random sampling with replacemen, Sankhyā, Ser A, 24, 287-302 (1962) · Zbl 0109.12704
[29] Pathak, P.K. (1982). Asymptotic normality of the average of distinct units in simple random sampling with replacement. Essays in Honour of CR Rao. G. Kallianpur, P. R. Krishnaiah, J. K. Ghosh (Eds), pp. 567-573. · Zbl 0487.62008
[30] Pfeffermann, D., The role of sampling weights when modeling survey data, Int. Stat. Rev., 61, 317-337 (1993) · Zbl 0779.62009 · doi:10.2307/1403631
[31] R Core Team, A Language and Environment for Statistical Computing (2017), Vienna: R Foundation for Statistical Computing, Vienna
[32] Raj, D.; Khamis, SH, Some remarks on sampling with replacement, Ann. Math. Stat., 39, 550-557 (1958) · Zbl 0086.12404 · doi:10.1214/aoms/1177706630
[33] Ramakrishnan, MK, Some results on the comparison of sampling with and without replacement, Sankhyā Ser. A, 51, 333-342 (1969) · Zbl 0183.48602
[34] Rao, JKN, On the comparison of sampling with and without replacement, Rev. Int. Stat. Inst., 34, 125-138 (1966) · Zbl 0144.19102 · doi:10.2307/1401762
[35] Rosenblatt, M., Remarks on some nonparametric estimates of a density function, The Annals of Mathematical Statistics, 27, 832-837 (1956) · Zbl 0073.14602 · doi:10.1214/aoms/1177728190
[36] Scott, DW, Multivariate Density Estimation: Theory, Practice and Visualization (2015), New York: Wiley, New York · Zbl 1311.62004 · doi:10.1002/9781118575574
[37] Sengupta, S., On comparisons of with and without replacement sampling strategies for estimating finite population mean in randomized response surveys, Sankhyā Ser B, 78, 66-77 (2016) · Zbl 1358.62019 · doi:10.1007/s13571-015-0107-1
[38] Seth, GR; Rao, JKN, On the comparison between simple random sampling with and without replacement, Sankhyā Ser A, 26, 85-86 (1964) · Zbl 0138.13302
[39] Sheather, SJ; Jones, MC, A reliable data-based bandwidth selection method for kernel density estimation, J. R. Stat. Soc. Ser. B, 53, 683-690 (1991) · Zbl 0800.62219
[40] Singh, S.; Sedory, SA, Sufficient bootstrapping, Comput. Stat. Data Anal., 55, 1629-1637 (2011) · Zbl 1328.62193 · doi:10.1016/j.csda.2010.10.010
[41] Sinha, BK; Sen, PK, On averaging over distinct units in sampling with replacement, Sankhyā Ser B, 51, 65-83 (1989) · Zbl 0685.62016
[42] Stuart, A.; Ord, JK, Kendall’s Advanced Theory of Statistics, 1 (1987), New York: Oxford University Press, New York · Zbl 0621.62001
[43] Wand, M.; Jones, M., Kernel Smoothing (1995), London: Chapman and Hall, London · Zbl 0854.62043 · doi:10.1007/978-1-4899-4493-1
[44] Watson, GS, Smooth regression analysis, Sankhyā Ser A, 26, 359-372 (1964) · Zbl 0137.13002
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.