×

Occupancy distributions arising in sampling from Gibbs-Poisson abundance models. (English) Zbl 1283.62227

Summary: Estimating the number \(n\) of unseen species from a \(k\)-sample displaying only \(p\leq k\) distinct sampled species has received attention for long. It requires a model of species abundance together with a sampling model. We start with a discrete model of iid stochastic species abundances, each with a Gibbs-Poisson distribution. A \(k\)-sample drawn from the \(n\)-species abundances vector is the one obtained while conditioning it on summing to \(k\). We discuss the sampling formulae (species occupancy distributions, frequency of frequencies) in this context. We then develop some aspects of the estimation of \(n\) problem from the size \(k\) of the sample and the observed value of \(P_{n,k}\), the number of distinct sampled species.
It is shown that it always makes sense to study these occupancy problems from a Gibbs-Poisson abundance model in the context of a population with infinitely many species. From this extension, a parameter \(\gamma\) naturally appears, which is a measure of richness or diversity of species. We rederive the sampling formulae for a population with infinitely many species, together with the distribution of the number \(P_k\) of distinct sampled species. We investigate the estimation of \(\gamma\) problem from the sample size \(k\) and the observed value of \(P_k\).
We then exhibit a large special class of Gibbs-Poisson distributions having the property that sampling from a discrete abundance model may equivalently be viewed as a sampling problem from a random partition of unity, now in the continuum. When \(n\) is finite, this partition may be built upon normalizing \(n\) infinitely divisible iid positive random variables by its partial sum. It is shown that the sampling process in the continuum should generically be biased on the total length appearing in the latter normalization. A construction with size-biased sampling from the ranked normalized jumps of a subordinator is also supplied, would the problem under study present infinitely many species. We illustrate our point of view with many examples, some of which being new ones.

MSC:

62P12 Applications of statistics to environmental and related topics
62D05 Sampling theory, sample surveys
62F10 Point estimation

References:

[1] Abbas, M., Bouroubi, S.: On new identities for Bell’s polynomials. Discrete Math. 293(13), 5-10 (2005) · Zbl 1063.05014 · doi:10.1016/j.disc.2004.08.023
[2] Bahls, P., Devitt-Ryder, R., Nguyen, T.: On the location of roots of logaritmically concave polynomials (2010). Preprint available at http://facstaff.unca.edu/pbahls/papers/BahlsDevittRyderNguyenV2.pdf
[3] Berestycki, N., Pitman, J.: Gibbs distributions for random partitions generated by a fragmentation process. J. Stat. Phys. 127(2), 381-418 (2007) · Zbl 1126.82013 · doi:10.1007/s10955-006-9261-1
[4] Berg, C.; Christensen, J. P.R.; Ressel, P., Harmonic analysis on semigroups, No. 100 (1984), New York · Zbl 0619.43001
[5] Bernstein, S.: Sur les fonctions absolument monotones. Acta Math. 52(1), 1-66 (1929) · JFM 55.0142.07 · doi:10.1007/BF02592679
[6] Bertoin, J.: Lévy Processes. Cambridge University Press, Cambridge (1996) · Zbl 0861.60003
[7] Blackwell, D., MacQueen, J.B.: Ferguson distributions via Pólya urn schemes. Ann. Stat. 1, 353-355 (1973) · Zbl 0276.62010 · doi:10.1214/aos/1176342372
[8] Bunge, J., Fitzpatrick, M.: Estimating the number of species: A Review. J. Am. Stat. Assoc. 88, 364-373 (1998)
[9] Charalambides, Ch.A., Singh, J.: A review of the Stirling numbers, their generalizations and statistical applications. Commun. Stat. Theory Methods 17(8) (1988) · Zbl 0696.62025
[10] Comtet, L.: Analyse Combinatoire, vols. 1 and 2. Presses Universitaires de France, Paris (1970) · Zbl 0221.05002
[11] Costin, O., Garoufalidis, S.: Resurgence of the fractional polylogarithms. Math. Res. Lett. 16(5), 817-826 (2009) · Zbl 1201.30044 · doi:10.4310/MRL.2009.v16.n5.a5
[12] Darroch, J.N.: On the distribution of the number of successes in independent trials. Ann. Math. Stat. 35, 1317-1321 (1964) · Zbl 0213.44402 · doi:10.1214/aoms/1177703287
[13] Davenport, H., Pólya, G.: On the product of two power series. Can. J. Math. 1, 1-5 (1949) · Zbl 0037.32505 · doi:10.4153/CJM-1949-001-1
[14] Donnelly, P.: Partition structures, Pòlya urns, the Ewens sampling formula and the age of alleles. Theor. Popul. Biol. 30, 271-288 (1986) · Zbl 0608.92005 · doi:10.1016/0040-5809(86)90037-7
[15] Engen, S.: On species frequency models. Biometrika 61, 263-270 (1974) · Zbl 0281.62062 · doi:10.1093/biomet/61.2.263
[16] Engen, S.: Stochastic Abundance Models. Monographs on Applied Probability and Statistics. Chapman and Hall, London (1978) · Zbl 0429.62075 · doi:10.1007/978-94-009-5784-8
[17] Ewens, W. J., Some remarks on the law of succession, No. 114, 229-244 (1996), New York · Zbl 0877.62092 · doi:10.1007/978-1-4612-0749-8_16
[18] Ewens, W.J.: The sampling theory of selectively neutral alleles. Theor. Popul. Biol. 3, 87-112 (1972) · Zbl 0245.92009 · doi:10.1016/0040-5809(72)90035-4
[19] Ewens, W. J.; Lessard, S. (ed.), Population genetics theory—the past and the future (1990), Dordrecht · Zbl 0718.92010
[20] Feng, S., The Poisson-Dirichlet distribution and related topics (2010), Heidelberg · Zbl 1214.60001
[21] Fisher, R.A., Corbet, A.S., Williams, C.B.: The relation between the number of species and the number of individuals in a random sample of an animal population. J. Anim. Ecol. 12, 42-58 (1943) · doi:10.2307/1411
[22] Garibaldi, U., Scalas, E.: Finitary Probabilistic Methods in Econophysics. Cambridge University Press, Cambridge (2010) · Zbl 1285.91001
[23] Gnedin, A., Pitman, J.: Exchangeable Gibbs partitions and Stirling triangles. (English, Russian summary) Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 325 (2005), Teor. Predst. Din. Sist. Komb. i Algoritm. Metody. 12, 83-102, 244-245; translation in J. Math. Sci. (N. Y.) 138(3) (2006) · Zbl 1293.60010
[24] Hardy, G.H., Littlewood, J.E., Pólya, G.: Inequalities, 2nd edn. Cambridge University Press, Cambridge (1952) · Zbl 0047.05302
[25] Ho, M.W., James, L., Lau, J.W.: Gibbs Partitions (EPPF’s) Derived From a Stable Subordinator are Fox H and Meijer G Transforms (2007). http://arxiv.org/abs/0708.0619
[26] Holst, L.: The Poisson-Dirichlet distribution and its relatives revisited (2001). Available at http://www.math.kth.se/matstat/fofu/reports/PoiDir.pdf · Zbl 1023.62014
[27] Hoshino, N.: Engen’s extended negative binomial model revisited. Ann. Inst. Stat. Math. 57(2), 369-387 (2005) · Zbl 1085.62008 · doi:10.1007/BF02507030
[28] Hoshino, N.: Random clustering based on the conditional inverse Gaussian-Poisson distribution. J. Jpn. Stat. Soc. 33(1), 105-117 (2003) · Zbl 1023.62014
[29] Hubbell, S.P.: The neutral theory of biodiversity and biogeography and Stephen Jay Gould. Paleobiology 31, 122-123 (2005) · doi:10.1666/0094-8373(2005)031[0122:TNTOBA]2.0.CO;2
[30] Huillet, T.: Unordered and ordered sample from Dirichlet distribution. Ann. Inst. Stat. Math. 57(3), 597-616 (2005) · Zbl 1095.62013 · doi:10.1007/BF02509241
[31] Huillet, T., Möhle, M.: Asymptotics of symmetric compound Poisson population models. Submitted to Comb. Probab. Comput. Special issue dedicated to the memory of Philippe Flajolet, Preprint available at hal-00730734 (2012) · Zbl 0877.62092
[32] Huillet, T., Möhle, M.: Correction on ‘Population genetics models with skewed fertilities: a forward and backward analysis’. Stoch. Models 28(3), 527-532 (2012) · Zbl 1367.92075 · doi:10.1080/15326349.2012.700799
[33] Keener, R., Rothman, E., Starr, N.: Distributions on partitions. Ann. Stat. 15(4), 1466-1481 (1987) · Zbl 0629.62023 · doi:10.1214/aos/1176350604
[34] Kingman, J.F.C.: Random discrete distributions. J. R. Stat. Soc. Ser. B 37, 1-22 (1975) · Zbl 0331.62019
[35] Kingman, J.F.C.: Mathematics of Genetic Diversity. CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 34. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (1980) · Zbl 0458.92009 · doi:10.1137/1.9781611970357
[36] Kingman, J.F.C.: Poisson processes. Clarendon Press, Oxford (1993) · Zbl 0771.60001
[37] Kolchin, V.F.: Random Mappings. Translation Series in Mathematics and Engineering. Optimization Software, Inc., Publications Division, New York (1986). Translated from the Russian. With a foreword by S.R.S. Varadhan · Zbl 0605.60010
[38] Kolchin, V.F.: Random Graphs. Encyclopedia of Mathematics and Its Applications, vol. 53. Cambridge University Press, Cambridge (1999) · Zbl 0918.05001
[39] Möhle, M.: The concept of duality and applications to Markov processes arising in neutral population genetics models. Bernoulli 5(5), 761-777 (1999) · Zbl 0942.92020 · doi:10.2307/3318443
[40] Neveu, J., Processus ponctuels, No. 598, 249-445 (1977), Berlin · Zbl 0439.60044 · doi:10.1007/BFb0097494
[41] Pitman, J.: Random discrete distributions invariant under size-biased permutation. Adv. Appl. Probab. 28, 525-539 (1996) · Zbl 0853.62018 · doi:10.2307/1428070
[42] Pitman, J.: Exchangeable and partially exchangeable random partitions. Probab. Theory Relat. Fields 102, 145-158 (1995) · Zbl 0821.60047 · doi:10.1007/BF01213386
[43] Pitman, J., Yor, M.: The two parameter Poisson-Dirichlet distribution derived from a stable subordinator. Ann. Probab. 25, 855-900 (1997) · Zbl 0880.60076 · doi:10.1214/aop/1024404422
[44] Pitman, J., Poisson-Kingman partitions, No. 40 (2003), Beachwood
[45] Pitman, J.: Combinatorial stochastic processes, July 724, 2002. Lectures from the 32nd Summer School on Probability Theory held in Saint-Flour. Springer, Berlin (2002). With a foreword by Jean Picard. Lecture Notes in Mathematics, 1875
[46] Pollard, H.: The completely monotonic character of the Mittag-Leffler function Ea(−x). Bull. Am. Math. Soc. 54, 1115-1116 (1948) · Zbl 0033.35902 · doi:10.1090/S0002-9904-1948-09132-7
[47] Prünster, I.: Bibliography: http://sites.carloalberto.org/pruenster/publications.html
[48] Schilling, R.L., Song, R., Vondracek, Z.: Bernstein Functions. Theory and Applications. de Gruyter Studies in Mathematics, vol. 37. Walter de Gruyter & Co., Berlin (2010) · Zbl 1197.33002
[49] Schoenberg, I.J.: On the zeros of the generating functions of multiply positive sequences and functions. Ann. Math. (2) 62, 447-471 (1955) · Zbl 0065.34301 · doi:10.2307/1970073
[50] Steutel, F.W., van Harn, K.: Infinite Divisibility of Probability Distributions on the Real Line. Monographs and Textbooks in Pure and Applied Mathematics, vol. 259. Marcel Dekker, New York (2004) · Zbl 1063.60001
[51] Tavaré, S.; Ewens, W. J.; Johnson, N. L. (ed.); Kotz, S. (ed.); Balakrishnan, N. (ed.), Multivariate Ewens distribution, 232-246 (1997), New York
[52] Watterson, G.A.: The stationary distribution of the infinitely-many neutral alleles diffusion model. J. Appl. Probab. 13(4), 639-651 (1976) · Zbl 0356.92012 · doi:10.2307/3212519
[53] Yang, S.L.: Some identities involving the binomial sequences. Discrete Math. 308, 51-58 (2008) · Zbl 1127.05006 · doi:10.1016/j.disc.2007.03.040
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.