×

On Poisson approximations for the Ewens sampling formula when the mutation parameter grows with the sample size. (English) Zbl 1466.60053

Summary: The Ewens sampling formula was first introduced in the context of population genetics by Warren John Ewens in 1972, and has appeared in a lot of other scientific fields. There are abundant approximation results associated with the Ewens sampling formula especially when one of the parameters, the sample size \(n\) or the mutation parameter \(\theta\) which denotes the scaled mutation rate, tends to infinity while the other is fixed. By contrast, the case that \(\theta\) grows with \(n\) has been considered in a relatively small number of works, although this asymptotic setup is also natural. In this paper, when \(\theta\) grows with \(n\), we advance the study concerning the asymptotic properties of the total number of alleles and of the component counts in the allelic partition assuming the Ewens sampling formula, from the viewpoint of Poisson approximations. Specifically, the main contributions of this paper are deriving Poisson approximations of the total number of alleles, an independent process approximation of small component counts, and functional central limit theorems, under the asymptotic regime that both \(n\) and \(\theta\) tend to infinity.

MSC:

60F05 Central limit and other weak theorems
60B12 Limit theorems for vector-valued random variables (infinite-dimensional case)
62E20 Asymptotic distribution theory in statistics
92D10 Genetics and epigenetics
17D92 Genetic algebras

References:

[1] Antoniak, C. E. (1974). Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Statist.2 1152-1174. · Zbl 0335.60034 · doi:10.1214/aos/1176342871
[2] Arratia, R., Barbour, A. D. and Tavaré, S. (1992). Poisson process approximations for the Ewens sampling formula. Ann. Appl. Probab.2 519-535. · Zbl 0756.60006 · doi:10.1214/aoap/1177005647
[3] Arratia, R., Barbour, A. D. and Tavaré, S. (2000). Limits of logarithmic combinatorial structures. Ann. Probab.28 1620-1644. · Zbl 1044.60003 · doi:10.1214/aop/1019160500
[4] Arratia, R., Barbour, A. D. and Tavaré, S. (2016). Exploiting the Feller coupling for the Ewens sampling formula [comment on MR3458585]. Statist. Sci.31 27-29. · Zbl 1442.60009 · doi:10.1214/15-STS537
[5] Arratia, R. and DeSalvo, S. (2016). Probabilistic divide-and-conquer: A new exact simulation method, with integer partitions as an example. Combin. Probab. Comput.25 324-351. · Zbl 1372.60006 · doi:10.1017/S0963548315000358
[6] Arratia, R., Stark, D. and Tavaré, S. (1995). Total variation asymptotics for Poisson process approximations of logarithmic combinatorial assemblies. Ann. Probab.23 1347-1388. · Zbl 0833.60010 · doi:10.1214/aop/1176988188
[7] Arratia, R. and Tavaré, S. (1992a). Limit theorems for combinatorial structures via discrete process approximations. Random Structures Algorithms3 321-345. · Zbl 0758.60009 · doi:10.1002/rsa.3240030310
[8] Arratia, R. and Tavaré, S. (1992b). The cycle structure of random permutations. Ann. Probab.20 1567-1591. · Zbl 0759.60007 · doi:10.1214/aop/1176989707
[9] Arratia, R., Barbour, A. D., Ewens, W. J. and Tavaré, S. (2018). Dual diffusions, killed diffusions, and the age distribution problem in population genetics. Theor. Popul. Biol.122 5-11. · Zbl 1405.92158
[10] Barbour, A. D. (1992). Refined approximations for the Ewens sampling formula. Random Structures Algorithms3 267-276. · Zbl 0798.60010 · doi:10.1002/rsa.3240030306
[11] Barbour, A. D. and Hall, P. (1984). On the rate of Poisson convergence. Math. Proc. Cambridge Philos. Soc.95 473-480. · Zbl 0544.60029 · doi:10.1017/S0305004100061806
[12] Barbour, A. D., Holst, L. and Janson, S. (1992). Poisson Approximation. Oxford Studies in Probability2. The Clarendon Press, New York. · Zbl 0746.60002
[13] Crane, H. (2016). The ubiquitous Ewens sampling formula. Statist. Sci.31 1-19. · Zbl 1442.60010 · doi:10.1214/15-STS529
[14] DeLaurentis, J. M. and Pittel, B. G. (1985). Random permutations and Brownian motion. Pacific J. Math.119 287-301. · Zbl 0578.60033 · doi:10.2140/pjm.1985.119.287
[15] DeSalvo, S. (2018). Probabilistic divide-and-conquer: Deterministic second half. Adv. in Appl. Math.92 17-50. · Zbl 1375.65084 · doi:10.1016/j.aam.2017.06.005
[16] Ewens, W. J. (1972). The sampling theory of selectively neutral alleles. Theor. Popul. Biol.3 87-112; erratum, ibid. 3 (1972), 240; erratum, ibid. 3 (1972), 376. · Zbl 0245.92009 · doi:10.1016/0040-5809(72)90035-4
[17] Favaro, S. and James, L. F. (2016). Relatives of the Ewens sampling formula in Bayesian nonparametrics [comment on MR3458585]. Statist. Sci.31 30-33. · Zbl 1442.62108 · doi:10.1214/15-STS538
[18] Feng, S. (2007). Large deviations associated with Poisson-Dirichlet distribution and Ewens sampling formula. Ann. Appl. Probab.17 1570-1595. · Zbl 1145.92025 · doi:10.1214/105051607000000230
[19] Feng, S. (2010). The Poisson-Dirichlet Distribution and Related Topics: Models and Asymptotic Behaviors. Springer, Heidelberg. · Zbl 1214.60001
[20] Feng, S. (2016). Diffusion processes and the Ewens sampling formula [comment on MR3458585]. Statist. Sci.31 20-22. · Zbl 1442.60013 · doi:10.1214/15-STS535
[21] Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist.1 209-230. · Zbl 0255.62037 · doi:10.1214/aos/1176342360
[22] Flajolet, P. and Soria, M. (1990). Gaussian limiting distributions for the number of components in combinatorial structures. J. Combin. Theory Ser. A53 165-182. · Zbl 0691.60035 · doi:10.1016/0097-3165(90)90056-3
[23] Goncharov, V. L. (1944). Some facts from combinatorics. Izv. Akad. Nauk SSSR, Ser. Mat.8 3-48. · Zbl 0063.01685
[24] Hansen, J. C. (1990). A functional central limit theorem for the Ewens sampling formula. J. Appl. Probab.27 28-43. · Zbl 0704.92011 · doi:10.2307/3214593
[25] Johnson, N. L., Kotz, S. and Balakrishnan, N. (1997). Discrete Multivariate Distributions. Wiley, New York. · Zbl 0868.62048
[26] Knuth, D. E. and Wilf, H. S. (1989). A short proof of Darboux’s lemma. Appl. Math. Lett.2 139-140. · Zbl 0708.30001 · doi:10.1016/0893-9659(89)90007-4
[27] Mano, S. (2017). Extreme sizes in Gibbs-type exchangeable random partitions. Ann. Inst. Statist. Math.69 1-37. · Zbl 1398.60023 · doi:10.1007/s10463-015-0530-0
[28] Pitman, J. (1995). Exchangeable and partially exchangeable random partitions. Probab. Theory Related Fields102 145-158. · Zbl 0821.60047 · doi:10.1007/BF01213386
[29] Shepp, L. A. and Lloyd, S. P. (1966). Ordered cycle lengths in a random permutation. Trans. Amer. Math. Soc.121 340-357. · Zbl 0156.18705 · doi:10.1090/S0002-9947-1966-0195117-8
[30] Teh, Y. W. (2016). Bayesian nonparametric modeling and the ubiquitous Ewens sampling formula [comment on MR3458585]. Statist. Sci.31 34-36. · Zbl 1442.62109 · doi:10.1214/15-STS540
[31] Tsukuda, K. (2017a). A change detection procedure for an ergodic diffusion process. Ann. Inst. Statist. Math.69 833-864. · Zbl 1382.62044 · doi:10.1007/s10463-016-0564-y
[32] Tsukuda, K. (2017b). Estimating the large mutation parameter of the Ewens sampling formula. J. Appl. Probab.54 42-54. Correction: to appear in J. Appl. Probab.55, no. 3. · Zbl 1401.62035 · doi:10.1017/jpr.2016.85
[33] Tsukuda, K. (2018). Functional central limit theorems in \(L^2(0,1)\) for logarithmic combinatorial assemblies. Bernoulli24 1033-1052. · Zbl 1429.60034 · doi:10.3150/16-BEJ847
[34] van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics3. Cambridge Univ. Press, Cambridge. · Zbl 0910.62001
[35] Varron, D. (2014). Donsker and Glivenko-Cantelli theorems for a class of processes generalizing the empirical process. Electron. J. Stat.8 2296-2320. · Zbl 1320.60092 · doi:10.1214/14-EJS955
[36] Watterson, G. A. (1974a). Models for the logarithmic species abundance distributions. Theor. Popul. Biol.6 217-250. · Zbl 0292.92003 · doi:10.1016/0040-5809(74)90025-2
[37] Watterson, G. A. (1974b). The sampling theory of selectively neutral alleles. Adv. in Appl. Probab.6 463-488. · Zbl 0289.62020 · doi:10.2307/1426228
[38] Yamato, H. (2013). Edgeworth expansions for the number of distinct components associated with the Ewens sampling formula. J. Japan Statist. Soc.43 17-28. · Zbl 1285.62019 · doi:10.14490/jjss.43.17
[39] Yannaros, N. (1991). Poisson approximation for random sums of Bernoulli random variables. Statist. Probab. Lett.11 161-165. · Zbl 0728.60051 · doi:10.1016/0167-7152(91)90135-E
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.