×

Standard statistical tools for the breed allocation problem. (English) Zbl 1514.62744

Summary: Modern technologies are frequently used in order to deal with new genomic problems. For instance, the STRUCTURE software is usually employed for breed assignment based on genetic information. However, standard statistical techniques offer a number of valuable tools which can be successfully used for dealing with most problems. In this paper, we investigated the capability of microsatellite markers for individual identification and their potential use for breed assignment of individuals in seventy Lidia breed lines and breeders. Traditional binomial logistic regression is applied to each line and used to assign one individual to a particular line. In addition, the area under receiver operating curve (AUC) criterion is used to measure the capability of the microsatellite-based models to separate the groups. This method allows us to identify which microsatellite loci are related to each line. Overall, only one subject was misclassified or a 99.94% correct allocation. The minimum observed AUC was 0.986 with an average of 0.997. These results suggest that our method is competitive for animal allocation and has some interpretative advantages and a strong relationship with methods based on SNPs and related techniques.

MSC:

62-XX Statistics
Full Text: DOI

References:

[1] L. Breiman, Random forests, Mach. Learn. 45(1) (2001), pp. 5-32. doi: 10.1023/A:1010933404324 · Zbl 1007.68152 · doi:10.1023/A:1010933404324
[2] J. Cañón, P. Alexandrino, I. Bessa, C. Carleos, Y. Carretero, S. Dunner, N. Ferran, D. Garcia, J. Jordana, D. Laloë, A. Pereira, A. Sanchez, and K. Moazami-Goudarzi, Genetic diversity measures of local European beef cattle breeds for conservation purposes, Genet. Select. Evol. 33 (2001), pp. 311-332. doi: 10.1186/1297-9686-33-3-311 · doi:10.1186/1297-9686-33-3-311
[3] J. Cañón, M.L. Checa, C. Carleos, J.L. Vega-Pla, M. Vallejo, and S. Dunner, The genetic structure of Spanish Celtic horse breeds inferred from microsatellite data, Anim. Genet. 31 (2000), pp. 39-48. doi: 10.1046/j.1365-2052.2000.00591.x · doi:10.1046/j.1365-2052.2000.00591.x
[4] J. Cañón, I. Tupac-Yupanqui, M.A. Garcia-Atance, M. Cortes, D. Garcia, J. Fernandéz, and S. Dunner, Genetic variation within the Lidia bovine lineage, Anim. Genet. 39 (2008), pp. 439-445. doi: 10.1111/j.1365-2052.2008.01738.x · doi:10.1111/j.1365-2052.2008.01738.x
[5] G. Evanno, S. Regnaut, and J. Goudet, Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study, Mol. Ecol. 14(8) (2005), pp. 2611-2620. doi: 10.1111/j.1365-294X.2005.02553.x · doi:10.1111/j.1365-294X.2005.02553.x
[6] R. Fluss, D. Faraggi, and B. Reiser, Estimation of the Youden Index and its associated cut point, Biomet. J. 47(4) (2005), pp. 458-472. doi: 10.1002/bimj.200410135 · Zbl 1442.62359 · doi:10.1002/bimj.200410135
[7] B. Guinand, A. Topchy, P.S. Page, M.K. Burnham-Curtis, P.K. Punch, and K.T. Cribner, Comparisons of likelihood and machine learning methods of individual classification, J. Heredity 93(4) (2002), pp. 260-269. doi: 10.1093/jhered/93.4.260 · doi:10.1093/jhered/93.4.260
[8] D.W. Hosmer and S. Lemeshow, Applied Logistic Regression, 2nd ed., New York, Wiley, 2000. · Zbl 0967.62045 · doi:10.1002/0471722146
[9] A. Liaw and M. Wiener, Classification and regression by random forest, R News 2/3 (2002), pp. 18-22.
[10] P. Martínez-Camblor, C. Carleos, and N. Corral, Powerful nonparametric statistics to compare \(k\) independent ROC curves, J. Appl. Statist. 38(7) (2011), pp. 1317-1332. doi: 10.1080/02664763.2010.498504 · Zbl 1218.62041
[11] P. Martínez-Camblor, C. Carleos, and N. Corral, General nonparametric ROC curve comparison, J. Korean Statist. Soc. 42(1) (2013), pp. 71-81. doi: 10.1016/j.jkss.2012.05.002 · Zbl 1294.62099 · doi:10.1016/j.jkss.2012.05.002
[12] P. Martínez-Camblor, J. de Uña-Álvarez, and C. Díaz-Cote, Expanded renal transplantation: A multi-state approach, (2013), Unpublished manuscript.
[13] S. Piry, A. Alapetite, J.M. Cornuet, D. Paetkau, L. Baudouin, and A. Estoup, GeneClass2: A software for genetic assignment and first-generation migrant detection, J. Heredity 95 (2004), pp. 536-539. doi: 10.1093/jhered/esh074 · doi:10.1093/jhered/esh074
[14] J.K. Pritchard, M. Stephens, and P. Donnelly, Inference of population structure using multilocus genotype data, Genetics 155 (2000), pp. 945-959.
[15] F. Rousset, Genepop’007: A complete reimplementation of the Genepop software for Windows and Linux, Mol. Ecol. Resources 8 (2008), pp. 103-106. doi: 10.1111/j.1471-8286.2007.01931.x · doi:10.1111/j.1471-8286.2007.01931.x
[16] I. Ruczinski, C. Kooperberg, and M. LeBlanc, Logic regression, J. Comput. Graph. Statist. 12(3) (2003), pp. 475-511. doi: 10.1198/1061860032238
[17] I. Ruczinski, C. Kooperberg, and M LeBlanc, Exploring interactions in high-dimensional genomic data: An overview of logic regression, with applications, J. Multivar. Anal. 90 (2004), pp. 178-195. doi: 10.1016/j.jmva.2004.02.010 · Zbl 1047.62071 · doi:10.1016/j.jmva.2004.02.010
[18] E. Setakis, H. Stirnadel, and D.J. Balding, Logistic regression protects against population structure in genetic association studies, Genome Res. 16 (2006), pp. 290-296. doi: 10.1101/gr.4346306 · doi:10.1101/gr.4346306
[19] C.E. Shannon, A mathematical theory of communication, Bell Syst. Tech. J. 27 (r1948), pp. 379-423. doi: 10.1002/j.1538-7305.1948.tb01338.x · Zbl 1154.94303 · doi:10.1002/j.1538-7305.1948.tb01338.x
[20] X.H. Zhou, N.A. Obuchowski, and D.K. McClish, Statistical Methods in Diagnostic Medicine, New York, Wiley, 2002. doi: 10.1002/9780470317082 · Zbl 1007.62092 · doi:10.1002/9780470317082
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.