Distance-based species tree estimation under the coalescent: information-theoretic trade-off between number of loci and sequence length. (English) Zbl 1379.92040
Summary: We consider the reconstruction of a phylogeny from multiple genes under the multispecies coalescent. We establish a connection with the sparse signal detection problem, where one seeks to distinguish between a distribution and a mixture of the distribution and a sparse signal. Using this connection, we derive an information-theoretic trade-off between the number of genes, \(m\), needed for an accurate reconstruction and the sequence length, \(k\), of the genes. Specifically, we show that to detect a branch of length \(f\), one needs \(m=\Theta(1/[f^{2}\sqrt{k}])\) genes.
MSC:
92D15 | Problems related to evolution |
60K35 | Interacting random processes; statistical mechanics type models; percolation theory |
92D10 | Genetics and epigenetics |