×

Regression for non-Euclidean data using distance matrices. (English) Zbl 1514.62555

Summary: Regression methods for common data types such as measured, count and categorical variables are well understood but increasingly statisticians need ways to model relationships between variable types such as shapes, curves, trees, correlation matrices and images that do not fit into the standard framework. Data types that lie in metric spaces but not in vector spaces are difficult to use within the usual regression setting, either as the response and/or a predictor. We represent the information in these variables using distance matrices which requires only the specification of a distance function. A low-dimensional representation of such distance matrices can be obtained using methods such as multidimensional scaling. Once these variables have been represented as scores, an internal model linking the predictors and the responses can be developed using standard methods. We call scoring as the transformation from a new observation to a score, whereas backscoring is a method to represent a score as an observation in the data space. Both methods are essential for prediction and explanation. We illustrate the methodology for shape data, unregistered curve data and correlation matrices using motion capture data from an experiment to study the motion of children with cleft lip.

MSC:

62-XX Statistics

Software:

shapes; pls; fda (R)

References:

[1] C. Cuadras and C. Arenas, A distance based regression model for prediction with mixed data, Comm. Statist. Theory Methods 19(6) (1990), pp. 2261-2279. doi: 10.1080/03610929008830319
[2] M.J. Daniels and M. Pourahmadi, Bayesian analysis of covariance matrices and dynamic models for longitudinal data, Biometrika 89(3) (2002), pp. 553-566. doi: 10.1093/biomet/89.3.553 · Zbl 1036.62019 · doi:10.1093/biomet/89.3.553
[3] I. Dryden, Shapes: Statistical Shape Analysis, R package version 1.1-3, 2009. Available at http://www.r-project.org.
[4] I. Dryden and K. Mardia, Statistical Shape Analysis, Wiley, Chichester, 1998. · Zbl 0901.62072
[5] I.L. Dryden, A. Koloydenko, and D. Zhou, Non-Euclidean statistics for covariance matrices, with applications to diffusion tensor imaging, Ann. Appl. Statist. 3(3) (2009), pp. 1102-1123. doi: 10.1214/09-AOAS249 · Zbl 1196.62063 · doi:10.1214/09-AOAS249
[6] J. Faraway, Backscoring in principal coordinates analysis, J. Comput. Graph. Statist. 21 (2012), pp. 394-412. doi: 10.1080/10618600.2012.672097
[7] T. Fletcher, Geodesic regression on Riemannian manifolds, Proceedings of the Third International Workshop on Mathematical Foundations of Computational Anatomy-Geometrical and Statistical Methods for Modelling Biological Shape Variability, 2011, pp. 75-86.
[8] J. Gower, Adding a point to vector diagrams in multivariate analysis, Biometrika 55(3) (1968), pp. 582-585. doi: 10.1093/biomet/55.3.582 · Zbl 0167.17802 · doi:10.1093/biomet/55.3.582
[9] M. Herdin, N. Czink, H. Ozcelik, and E. Bonek, Correlation matrix distance, a meaningful measure for evaluation of non-stationary mimo channels, Vehicular Technology Conference, 2005. VTC 2005-Spring. 2005 IEEE 61st, Vol. 1, 2005, pp. 136-140.
[10] N. Higham, Computing the nearest correlation matrix – a problem from finance, IMA J. Numer. Anal. 22(3) (2002), pp. 329-343. doi: 10.1093/imanum/22.3.329 · Zbl 1006.65036 · doi:10.1093/imanum/22.3.329
[11] S. de Jong, SIMPLS: An alternative approach to partial least squares regression, Chemometr. Intell. Lab. Syst. 18 (1993), pp. 251-263. doi: 10.1016/0169-7439(93)85002-X · doi:10.1016/0169-7439(93)85002-X
[12] P. Legendre, F. Lapointe, and P. Casgrain, Modeling brain evolution from behavior: A permutational regression approach, Evolution 48 (1994), pp. 1487-1499. doi: 10.2307/2410243 · doi:10.2307/2410243
[13] J.W. Lichstein, Multiple regression on distance matrices: A multivariate spatial analysis tool, Plant Ecol. 188(2) (2006), pp. 117-131. doi: 10.1007/s11258-006-9126-3 · doi:10.1007/s11258-006-9126-3
[14] B. McArdle and M. Anderson, Fitting multivariate models to community data: A comment on distance-based redundancy analysis, Ecology 82(1) (2001), pp. 290-297. doi: 10.1890/0012-9658(2001)082[0290:FMMTCD]2.0.CO;2 · doi:10.1890/0012-9658(2001)082[0290:FMMTCD]2.0.CO;2
[15] B. Mevik and R. Wehrens, The pls package: Principal component and partial least squares regression in R, J. Statist. Softw. 18(2) (2007), pp. 1-24.
[16] M. Niethammer, Y. Huang, and F.-X. Vialard, Geodesic regression for image time-series, Medical Image Computing and Computer-Assisted Intervention-MICCAI 2011, Springer, 2011, pp. 655-662.
[17] J. Ramsay and B. Silverman, Functional Data Analysis, 2nd ed., Springer, New York, 2005. · Zbl 1079.62006 · doi:10.1007/b98888
[18] C. Rasmussen and C. Williams, Gaussian Processes for Machine Learning, The MIT Press, Cambridge, MA, 2006. · Zbl 1177.68165 · doi:10.1007/978-3-540-28650-9_4
[19] A. Srivastava, W. Wu, S. Kurtek, E. Klassen, and J. Marron, Registration of functional data using Fisher-Rao metric, preprint (2011). Available at arXiv:1103.3817.
[20] J. Tenenbaum, V. de Silva, and J. Langford, A global geometric framework for nonlinear dimensionality reduction, Science 290 (2000), pp. 2319-2323. doi: 10.1126/science.290.5500.2319 · doi:10.1126/science.290.5500.2319
[21] C.-A. Trotman, J. Faraway, C. Philips, and J. van Aalst, Effects of lip revision surgery in cleft lip/palate patients, J. Dent. Res. 89 (2010), pp. 728-732. doi: 10.1177/0022034510365485 · doi:10.1177/0022034510365485
[22] H. Wang and J. Marron, Object oriented data analysis: Sets of trees, Ann. Statist. 35(5) (2007), pp. 1849-1873. doi: 10.1214/009053607000000217 · Zbl 1126.62002 · doi:10.1214/009053607000000217
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.