×

Improved maximum likelihood reconstruction of complex multi-generational pedigrees. (English) Zbl 1303.92085

Summary: The reconstruction of pedigrees from genetic marker data is relevant to a wide range of applications. Likelihood-based approaches aim to find the pedigree structure that gives the highest probability to the observed data. Existing methods either entail an exhaustive search and are hence restricted to small numbers of individuals, or they take a more heuristic approach and deliver a solution that will probably have high likelihood but is not guaranteed to be optimal. By encoding the pedigree learning problem as an integer linear program we can exploit efficient optimisation algorithms to construct pedigrees guaranteed to have maximal likelihood for the standard situation where we have complete marker data at unlinked loci and segregation of genes from parents to offspring is Mendelian. Previous work demonstrated efficient reconstruction of pedigrees of up to about 100 individuals. The modified method that we present here is not so restricted: we demonstrate its applicability with simulated data on a real human pedigree structure of over 1600 individuals. It also compares well with a very competitive approximate approach in terms of solving time and accuracy. In addition to identifying a maximum likelihood pedigree, we can obtain any number of pedigrees in decreasing order of likelihood. This is useful for assessing the uncertainty of a maximum likelihood solution and permits model averaging over high likelihood pedigrees when this would be appropriate. More importantly, when the solution is not unique, as will often be the case for large pedigrees, it enables investigation into the properties of maximum likelihood pedigree estimates which has not been possible up to now. Crucially, we also have a means of assessing the behaviour of other approximate approaches which all aim to find a maximum likelihood solution. Our approach hence allows us to properly address the question of whether a reasonably high likelihood solution that is easy to obtain is practically as useful as a guaranteed maximum likelihood solution. The efficiency of our method on such large problems bodes well for extensions beyond the standard setting where some pedigree members may be latent, genotypes may be measured with error and markers may be linked.

MSC:

92D15 Problems related to evolution
92D10 Genetics and epigenetics
90C10 Integer programming

Software:

FRANz

References:

[1] Almudevar, A., A simulated annealing algorithm for maximum likelihood pedigree reconstruction, Theor. Popul. Biol., 63, 63-75 (2003) · Zbl 1104.62115
[2] Bartlett, M.; Cussens, J., Advances in Bayesian network learning using Integer Programming, (Nicholson, A.; Smyth, P., Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence. Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence, (UAI 2013) (2013), AUAI Press), 182-191
[3] Butler, J. M.; Schoske, R.; Vallone, P. M.; Redman, J. W.; Kline, M. C., Allele frequencies for 15 autosomal STR loci on US Caucasian, African American, and Hispanic populations, J. Forensic Sci., 48, 4 (2003)
[4] Choi, Y.; Wijsman, E. M.; Weir, B. S., Case-control testing in the presence of unknown relaionships, Genet. Epidemiol., 33, 668-678 (2009)
[5] Cowell, R. G., Efficient maximum likelihood pedigree reconstruction, Theor. Popul. Biol., 76, 4, 285-291 (2009) · Zbl 1403.92135
[6] Cowell, R. G., A simple greedy algorithm for reconstructing pedigrees, Theor. Popul. Biol., 83, 55-63 (2013)
[8] Cussens, J., Bayesian network learning with cutting planes, (Cozman, F. G.; Pfeffer, A., Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence. Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence, (UAI 2011) (2011), AUAI Press), 153-160
[9] Cussens, J.; Bartlett, M.; Jones, E. M.; Sheehan, N. A., Maximum likelihood pedigree reconstruction using integer linear programming, Genet. Epidemiol., 37, 69-83 (2013)
[10] Dawid, A. P.; Mortera, J.; Pascali, V. L.; van Boxel, D., Probabilistic expert systems for forensic infererence from genetic markers, Scand. J. Statist., 29, 577-595 (2002) · Zbl 1035.62111
[11] Edwards, A. W.F., The structure of the Polar Eskimo genealogy, Hum. Hered., 42, 242-252 (1992)
[12] Génin, E.; Clerget-Darpoux, F., Consanguinity and the sib-pair method: an approach using identity by descent between and within individuals, Am. J. Hum. Genet., 59, 1149-1162 (1996)
[13] Gilberg, A.; Gilberg, L.; Gilberg, R.; Holm, M., (Polar Eskimo Genealogy. Polar Eskimo Genealogy, Meddelelser om Gronland, vol. 203, Nr. 4 (1978), Nyt Nordisk Forlag Arnold Busck: Nyt Nordisk Forlag Arnold Busck Kobenhavn)
[14] Glazner, C.; Thompson, E. A., Improving pedigree-based linkage analysis by estimating co-ancestry among families, Stat. Appl. Genet. Mol. Biol., 11, 2 (2012), Article 11
[15] Jaakkola, T.; Sontag, D.; Globerson, A.; Meila, M., Learning Bayesian network structures using LP relaxations, (Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, AISTATS 2010. Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, AISTATS 2010, Journal of Machine Learning Research Workshop and Conference Proceedings, vol. 9 (2010)), 358-365
[16] Jankovic, A.; vonHoldt, B. M.; Rosenberg, N. A., Heterozygosity of the Yellowstone wolves, Mol. Ecol., 19, 3246-3249 (2010)
[17] Lander, E. S.; Green, P., Construction of multilocus genetic linkage maps in humans, Proc. Natl. Acad. Sci. (USA), 84, 2363-2367 (1987)
[18] Lange, K.; Elston, R. C., Extensions to pedigree analysis. I. Likelihood calculations for simple and complex pedigrees, Hum. Hered., 25, 95-105 (1975)
[19] Lauritzen, S. L., Graphical Models (1996), Clarendon Press: Clarendon Press Oxford, United Kingdom · Zbl 0907.62001
[20] Lauritzen, S. L.; Sheehan, N., Graphical models for genetic analyses, Statist. Sci., 18, 489-514 (2003) · Zbl 1055.62126
[21] Malone, B.; Kangas, K.; Järvisalo, M.; Koivisto, M.; Myllymäki, P., Predicting the hardness of learning Bayesian networks, (Proceedings of the 28th AAAI Conference on Artificial Intelligence. Proceedings of the 28th AAAI Conference on Artificial Intelligence, (AAAI 2014) (2014), AAAI Press)
[22] Riester, M.; Stadler, P. F.; Klemm, K., FRANz: reconstruction of wild multi-generation pedigrees, Bioinformatics, 25, 2134-2139 (2009)
[23] Sheehan, N., Sampling genotypes on complex pedigrees with phenotypic constraints: the origin of the B allele among the Polar Eskimos, IMA J. Math. Appl. Med. Biol., 9, 1-18 (1992) · Zbl 0825.92092
[24] Sheehan, N. A., On the application of Markov chain Monte Carlo methods to genetic analyses on complex pedigrees, Internat. Statist. Rev., 68, 83-110 (2000) · Zbl 1107.92305
[25] Sheehan, N. A.; Egeland, T., Adjusting for founder relatedness in a linkage analysis using prior information, Hum. Hered., 65, 4, 221-231 (2008)
[26] Stankovich, J.; Bahlo, M.; Rubio, J. P.; Wilkinson, C. R.; Thomson, R.; Banks, A.; Ring, M.; Foote, S. J.; Speed, T. P., Identifying nineteenth century links from genotypes, Hum. Genet., 117, 188-199 (2005)
[27] Staples, J.; Nickerson, D. A.; Below, J. E., Utilizing graph theory to select the largest set of unrelated individuals for genetic analyses, Genet. Epidemiol., 37, 136-141 (2013)
[28] Thompson, E. A., Pedigree Analysis in Human Genetics (1986), The Johns Hopkins University Press: The Johns Hopkins University Press Baltimore
[29] Thompson, E. A., (Statistical Inference from Genetic Data on Pedigrees. Statistical Inference from Genetic Data on Pedigrees, NSF-CBMS Regional Conference Series in Probability and Statistics, vol. 6 (2000), Institute of Mathematical Statistics and the American Statistical Association: Institute of Mathematical Statistics and the American Statistical Association Beachwood, Ohio, USA) · Zbl 0972.92022
[30] Thornton, T.; McPeek, M. S., ROADTRIPS: case-control association testing with partially or completely unknown population and pedigree structure, Am. J. Hum. Genet., 86, 172-184 (2010)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.