×

Distance-based species tree estimation under the coalescent: information-theoretic trade-off between number of loci and sequence length. (English) Zbl 1379.92040

Summary: We consider the reconstruction of a phylogeny from multiple genes under the multispecies coalescent. We establish a connection with the sparse signal detection problem, where one seeks to distinguish between a distribution and a mixture of the distribution and a sparse signal. Using this connection, we derive an information-theoretic trade-off between the number of genes, \(m\), needed for an accurate reconstruction and the sequence length, \(k\), of the genes. Specifically, we show that to detect a branch of length \(f\), one needs \(m=\Theta(1/[f^{2}\sqrt{k}])\) genes.

MSC:

92D15 Problems related to evolution
60K35 Interacting random processes; statistical mechanics type models; percolation theory
92D10 Genetics and epigenetics