×

A statistical pipeline for identifying physical features that differentiate classes of 3D shapes. (English) Zbl 1478.62390

Summary: The recent curation of large-scale databases with 3D surface scans of shapes has motivated the development of tools that better detect global patterns in morphological variation. Studies, which focus on identifying differences between shapes, have been limited to simple pairwise comparisons and rely on prespecified landmarks (that are often known). We present SINATRA, the first statistical pipeline for analyzing collections of shapes without requiring any correspondences. Our novel algorithm takes in two classes of shapes and highlights the physical features that best describe the variation between them. We use a rigorous simulation framework to assess our approach. Lastly, as a case study we use SINATRA to analyze mandibular molars from four different suborders of primates and demonstrate its ability recover known morphometric variation across phylogenies.

MSC:

62R40 Topological data analysis
62P10 Applications of statistics to biology and medical sciences; meta analysis
62H35 Image analysis in multivariate analysis
60G15 Gaussian processes
92D15 Problems related to evolution
Full Text: DOI

References:

[1] Adler, D., Nenadic, O. and Zucchini, W. (2003). RGL: An R-library for 3D visualization with OpenGL. In Proceedings of the 35th Symposium of the Interface: Computing Science and Statistics, Salt Lake City 35 1-11.
[2] Anderson, J. T., Willis, J. H. and Mitchell-Olds, T. (2011). Evolutionary genetics of plant adaptation. Trends Genet. 27 258-266.
[3] Belongie, S. (1999). Rodrigues’ rotation formula. From MathWorld—A Wolfram Web Resource, created by Eric W. Weisstein. Available at http://mathworld.wolfram.com/RodriguesRotationFormula.html.
[4] Bendich, P., Marron, J. S., Miller, E., Pieloch, A. and Skwerer, S. (2016). Persistent homology analysis of brain artery trees. Ann. Appl. Stat. 10 198-218. · doi:10.1214/15-AOAS886
[5] Boyer, D. M., Lipman, Y., Clair, E. S., Puente, J., Patel, B. A., Funkhouser, T., Jernvall, J. and Daubechies, I. (2011). Algorithms to automatically quantify the geometric similarity of anatomical surfaces. Proc. Natl. Acad. Sci. USA 108 18221-18226. · doi:10.1073/pnas.1112822108
[6] Boyer, D. M., Puente, J., Gladman, J. T., Glynn, C., Mukherjee, S., Yapuncich, G. S. and Daubechies, I. (2015). A new fully automated approach for aligning and comparing shapes. Anat. Rec. (Hoboken) 298 249-276.
[7] Boyer, D. M., Gunnell, G. F., Kaufman, S. and McGeary, T. M. (2016). Morphosource: Archiving and sharing 3-D digital specimen data. The Paleontological Society Papers 22 157-181.
[8] Cates, J., Elhabian, S. and Whitaker, R. (2017). Shapeworks: Particle-based shape correspondence and visualization software. In Statistical Shape and Deformation Analysis 257-298. Elsevier, Amsterdam.
[9] Chaudhuri, A., Kakde, D., Sadek, C., Gonzalez, L. and Kong, S. (2017). The mean and median criteria for kernel bandwidth selection for support vector data description. In IEEE International Conference on Data Mining Workshops (ICDMW), 2017 842-849.
[10] Chen, J., Källman, T., Ma, X., Gyllenstrand, N., Zaina, G., Morgante, M., Bousquet, J., Eckert, A., Wegrzyn, J. et al. (2012). Disentangling the roles of history and local selection in shaping clinal variation of allele frequencies and gene expression in Norway spruce (Picea abies). Genetics 191 865-881.
[11] Cheng, L., Ramchandran, S., Vatanen, T., Lietzén, N., Lahesmaa, R., Vehtari, A. and Lähdesmäki, H. (2019). An additive Gaussian process regression model for interpretable non-parametric analysis of longitudinal data. Nat. Commun. 10 1798.
[12] Cover, T. and Hart, P. (2006). Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13 21-27. · Zbl 0154.44505
[13] Crawford, L., Zeng, P., Mukherjee, S. and Zhou, X. (2017). Detecting epistasis with the marginal epistasis test in genetic mapping studies of quantitative traits. PLoS Genet. 13 e1006869.
[14] Crawford, L., Wood, K. C., Zhou, X. and Mukherjee, S. (2018). Bayesian approximate Kernel regression with variable selection. J. Amer. Statist. Assoc. 113 1710-1721. · Zbl 1409.62132 · doi:10.1080/01621459.2017.1361830
[15] Crawford, L., Flaxman, S. R., Runcie, D. E. and West, M. (2019). Variable prioritization in nonlinear black box methods: A genetic association case study. Ann. Appl. Stat. 13 958-989. · Zbl 1423.62062 · doi:10.1214/18-AOAS1222
[16] Crawford, L., Monod, A., Chen, A. X., Mukherjee, S. and Rabadán, R. (2020). Predicting clinical outcomes in glioblastoma: An application of topological and functional data analysis. J. Amer. Statist. Assoc. 115 1139-1150. · Zbl 1441.62316 · doi:10.1080/01621459.2019.1671198
[17] Crompton, R. H., Savage, R. and Spears, I. R. (1998). The mechanics of food reduction in Tarsius bancanus. Hard-object feeder, soft-object feeder or both? Folia Primatol. (Basel) 69 Suppl 1 41-59. · doi:10.1159/000052698
[18] Curry, J., Mukherjee, S. and Turner, K. (2019). How many directions determine a shape and other sufficiency results for two topological transforms. Available at arXiv:1805.09782.
[19] Dupuis, P., Grenander, U. and Miller, M. I. (1998). Variational problems on flows of diffeomorphisms for image matching. Quart. Appl. Math. 56 587-600. · Zbl 0949.49002 · doi:10.1090/qam/1632326
[20] Fasy, B. T., Micka, S., Millman, D. L., Schenfisch, A. and Williams, L. (2018). Challenges in reconstructing shapes from Euler characteristic curves. Available at arXiv:1811.11337.
[21] Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33 1-22.
[22] Gao, T. (2015). Hypoelliptic diffusion maps and their applications in automated geometric morphometrics, Ph.D. thesis, Duke Univ.
[23] Gao, T. (2021). The diffusion geometry of fibre bundles: Horizontal diffusion maps. Appl. Comput. Harmon. Anal. 50 147-215. · Zbl 1464.58011 · doi:10.1016/j.acha.2019.08.001
[24] Gao, T., Kovalsky, S. Z. and Daubechies, I. (2019). Gaussian process landmarking on manifolds. SIAM J. Math. Data Sci. 1 208-236. · Zbl 1499.60114 · doi:10.1137/18M1184035
[25] Gao, T., Kovalsky, S. Z., Boyer, D. M. and Daubechies, I. (2019). Gaussian process landmarking for three-dimensional geometric morphometrics. SIAM J. Math. Data Sci. 1 237-267. · Zbl 1499.60113 · doi:10.1137/18M1203481
[26] Ghrist, R., Levanger, R. and Mai, H. (2018). Persistent homology and Euler integral transforms. J. Appl. Comput. Topol. 2 55-60. · Zbl 1461.58006 · doi:10.1007/s41468-018-0017-1
[27] Gienapp, P., Teplitsky, C., Alho, J. S., Mills, J. A. and Merila, J. (2008). Climate change and evolution: Disentangling environmental and genetic responses. Mol. Ecol. 17 167-178.
[28] Gopalan, G. and Bornn, L. (2015). FastGP: An R package for Gaussian processes. Available at arXiv:1507.06055.
[29] Goswami, A. (2015). Phenome10K: A free online repository for 3-D scans of biological and palaeontological specimens. Website.
[30] Graven, A. (1989). The phylogenetic regression. Philos. Trans. R. Soc. Lond. B, Biol. Sci. 326 87-99.
[31] Guatelli-Steinberg, D. (2003). Primate dentition: An introduction to the teeth of non-human primates. Am. J. Phys. Anthropol. 121 189-189.
[32] Heckerman, D., Gurdasani, D., Kadie, C., Pomilla, C., Carstensen, T., Martin, H., Ekoru, K., Nsubuga, R. N., Ssenyomo, G. et al. (2016). Linear mixed model for heritability estimation that explicitly addresses environmental variation. Proc. Natl. Acad. Sci. USA 113 7377-7382.
[33] Henderson, C. R. (1984). Applications of Linear Models in Animal Breeding. Guelph, Ont.: University of Guelph. Includes index.
[34] Hong, Y., Golland, P. and Zhang, M. (2017). Fast geodesic regression for population-based image analysis. In International Conference on Medical Image Computing and Computer-Assisted Intervention 317-325. Springer, Berlin.
[35] Huang, R., Achlioptas, P., Guibas, L. and Ovsjanikov, M. (2019). Limit shapes-A tool for understanding shape differences and variability in 3D model collections. Comput. Graph. Forum 38 187-202.
[36] Jiang, Y. and Reif, J. C. (2015). Modeling epistasis in genomic selection. Genetics 201 759-768.
[37] Kang, H. M., Zaitlen, N. A., Wade, C. M., Kirby, A., Heckerman, D., Daly, M. J. and Eskin, E. (2008). Efficient control of population structure in model organism association mapping. Genetics 178 1709-1723.
[38] Kang, H. M., Sul, J. H., Service, S. K., Zaitlen, N. A., Kong, S.-Y., Freimer, N. B., Sabatti, C. and Eskin, E. (2010). Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42 348-354.
[39] Kendall, D. G. (1989). A survey of the statistical theory of shape. Statist. Sci. 4 87-120. · Zbl 0955.60507
[40] Lai, Y.-T., Yeung, C. K. L., Omland, K. E., Pang, E.-L., Hao, Y., Liao, B.-Y., Cao, H.-F., Zhang, B.-W., Yeh, C.-F. et al. (2019). Standing genetic variation as the predominant source for adaptation of a songbird. Proc. Natl. Acad. Sci. USA 116 2152-2157.
[41] Li, F., Zhang, T., Wang, Q., Gonzalez, M. Z., Maresh, E. L. and Coan, J. A. (2015). Spatial Bayesian variable selection and grouping for high-dimensional scalar-on-image regression. Ann. Appl. Stat. 9 687-713. · Zbl 1397.62458 · doi:10.1214/15-AOAS818
[42] McDonald, K. R., Broderick, W. F., Huettel, S. A. and Pearson, J. M. (2019). Bayesian nonparametric models characterize instantaneous strategies in a competitive dynamic game. Nat. Commun. 10 1808.
[43] Miller, E. (2015). Fruit flies and moduli: Interactions between biology and mathematics. Notices Amer. Math. Soc. 62 1178-1184. · Zbl 1338.92002 · doi:10.1090/noti1290
[44] Neal, R. M. (1997). Monte Carlo implementation of Gaussian process models for Bayesian regression andMonte Carlo implementation of Gaussian process models for Bayesian regression and classification. Technical Report No. 9702, Dept. of Statistics, Univ. Toronto.
[45] Neal, R. M. (1999). Regression and classification using Gaussian process priors. In Bayesian Statistics, 6 (Alcoceber, 1998) 475-501. Oxford Univ. Press, New York. · Zbl 0974.62072
[46] Nickisch, H. and Rasmussen, C. E. (2008). Approximations for binary Gaussian process classification. J. Mach. Learn. Res. 9 2035-2078. · Zbl 1225.62087
[47] Oudot, S. and Solomon, E. (2018). Inverse problems in topological persistence. Available at arXiv:1810.10813. · Zbl 1447.55006
[48] Ovsjanikov, M., Ben-Chen, M., Solomon, J., Butscher, A. and Guibas, L. (2012). Functional maps: A flexible representation of maps between shapes. ACM Trans. Graph. 31 30:1-30:11.
[49] Pillai, N. S., Wu, Q., Liang, F., Mukherjee, S. and Wolpert, R. L. (2007). Characterizing the function space for Bayesian kernel models. J. Mach. Learn. Res. 8 1769-1797. · Zbl 1222.62039
[50] Pozzi, L., Hodgson, J. A., Burella, A. S., Raaumb, R. L. and Disotell, T. R. (2014). Primate phylogenetic relationships and divergence dates inferred from complete mitochondrial genomes. Mol. Phylogenet. Evol. 75 165-183.
[51] Puente, J. (2013). Distances and algorithms to compare sets of shapes for automated biological morphometrics, Ph.D. thesis, Princeton Univ., Princeton, NJ.
[52] Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA. · Zbl 1177.68165
[53] Rodriguez-Nieva, J. F. and Scheurer, M. S. (2019). Identifying topological order through unsupervised machine learning. Nat. Phys.
[54] Rustamov, R. M., Ovsjanikov, M., Azencot, O., Ben-Chen, M., Chazal, F. and Guibas, L. (2013). Map-based exploration of intrinsic shape differences and variability. ACM Trans. Graph. 32 1-12. · Zbl 1305.68274
[55] Schlager, S., Zheng, G., Li, S. and Székely, G. (2017). Morpho and Rvcg—Shape analysis in R: R-packages for geometric morphometrics, shape analysis and surface manipulations. In Statistical Shape and Deformation Analysis: Methods, Implementation and Applications 217-256. Academic Press, San Diego.
[56] Schlager, S., Profico, A., Di Vincenzo, F. and Manzi, G. (2018). Retrodeformation of fossil specimens based on 3D bilateral semi-landmarks: Implementation in the R package “Morpho”. PLoS ONE 13 e0194073.
[57] Schölkopf, B., Herbrich, R. and Smola, A. J. (2001). A generalized representer theorem. In Computational Learning Theory (Amsterdam, 2001). Lecture Notes in Computer Science 2111 416-426. Springer, Berlin. · Zbl 0992.68088 · doi:10.1007/3-540-44581-1_27
[58] Sela, M., Aflalo, Y. and Kimmel, R. (2015). Computational caricaturization of surfaces. Comput. Vis. Image Underst. 141 1-17.
[59] Sellke, T., Bayarri, M. J. and Berger, J. O. (2001). Calibration of \(p\) values for testing precise null hypotheses. Amer. Statist. 55 62-71. · Zbl 1182.62053 · doi:10.1198/000313001300339950
[60] Simon, N., Friedman, J., Hastie, T. and Tibshirani, R. (2013). A sparse-group lasso. J. Comput. Graph. Statist. 22 231-245. · doi:10.1080/10618600.2012.681250
[61] Singleton, K. R., Crawford, L., Tsui, E., Manchester, H. E., Maertens, O., Liu, X., Liberti, M. V., Magpusao, A. N., Stein, E. M. et al. (2017). Melanoma therapeutic strategies that select against resistance by exploiting MYC-driven evolutionary convergence. Cell Rep. 21 2796-2812.
[62] St Clair, E. M. and Boyer, D. M. (2016). Lower molar shape and size in prosimian and platyrrhine primates. Am. J. Phys. Anthropol. 161 237-258.
[63] Swain, P. S., Stevenson, K., Leary, A., Montano-Gutierrez, L. F., Clark, I. B. N., Vogel, J. and Pilizota, T. (2016). Inferring time derivatives including cell growth rates using Gaussian processes. Nat. Commun. 7 13766.
[64] Turner, K., Mukherjee, S. and Boyer, D. M. (2014). Persistent homology transform for modeling shapes and surfaces. Inf. Inference 3 310-344. · Zbl 06840289 · doi:10.1093/imaiai/iau011
[65] Wang, B., Sudijono, T., Kirveslahti, H., Gao, T., Boyer, D. M., Mukherjee, S. and Crawford, L. (2021). Supplement to “A statistical pipeline for identifying physical features that differentiate classes of 3D shapes.” https://doi.org/10.1214/20-AOAS1430SUPPA.
[66] Wang, B., Sudijono, T., Kirveslahti, H., Gao, T., Boyer, D. M., Mukherjee, S. and Crawford, L. (2021). Source Code for “A statistical pipeline for identifying physical features that differentiate classes of 3D shapes.” https://doi.org/10.1214/20-AOAS1430SUPPB.
[67] Williams, C. K. I. and Barber, D. (1998). Bayesian classification with Gaussian processes. IEEE Trans. Pattern Anal. Mach. Intell. 20 1342-1351.
[68] Worsley, K. J. (1995). Estimating the number of peaks in a random field using the Hadwiger characteristic of excursion sets, with applications to medical images. Ann. Statist. 23 640-669. · Zbl 0898.62120 · doi:10.1214/aos/1176324540
[69] Yang, Y. and Zou, H. (2015). A fast unified algorithm for solving group-lasso penalize learning problems. Stat. Comput. 25 1129-1141. · Zbl 1331.62343 · doi:10.1007/s11222-014-9498-5
[70] Zhang, Z., Dai, G. and Jordan, M. I. (2011). Bayesian generalized kernel mixed models. J. Mach. Learn. Res. 12 111-139. · Zbl 1280.68221
[71] Zhou, X. and Stephens, M. (2012). Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44 821-825.
[72] Zhu, X. and Stephens, M. (2018). Large-scale genome-wide enrichment analyses identify new trait-associated genes and pathways across 31 human phenotypes. Nat. Commun. 9 4361.
[73] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 301-320 · Zbl 1069.62054 · doi:10.1111/j.1467-9868.2005.00503.x
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.