Smoothing and mixed models

M. P. Wand¹

2931 Accesses
192 Citations
Explore all metrics

Summary

Smoothing methods that use. basis functions with penalisation can be formulated as maximum likelihood estimators and best predictors in a mixed model framework. Such connections are at least a quarter of a century old but, perhaps with the advent of mixed model software, have led to a paradigm shift in the field of smoothing. The reason is that most, perhaps all, models involving smoothing can be expressed as a mixed model and hence enjoy the benefit of the growing body of methodology and software for general mixed model analysis. The handling of other complications such as clustering, missing data and measurement error is generally quite straightforward with mixed model representations of smoothing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the estimation of variance parameters in non-standard generalised linear mixed models: application to penalised smoothing

Article 11 June 2018

Inverse prediction for multivariate mixed models with standard software

Article 04 August 2016

Post-Model-Selection Prediction Intervals for Generalized Linear Models

Article 06 April 2024

References

Antoniadis, A. and Fan, J. (2001). Regularization of wavelet approximations (with discussion). Journal of the American Statistical Association, 96, 939–967.
Article MathSciNet MATH Google Scholar
Breslow, N.E. and Clayton, D.G. (1993). Approximate inference in generalized linear mixed models. Journal of the American Statistical Association, 88, 9–25.
MATH Google Scholar
Brumback, B.A. and Rice, J.A. (1998). Smoothing spline models for the analysis of nested and crossed samples of curves (with discussion). Journal of the American Statistical Association, 93, 961–994.
Article MathSciNet MATH Google Scholar
Brumback, B.A., Ruppert, D. and Wand, M.P. (1999). Comment on Shively, Kohn and Wood. Journal of the American Statistical Association, 94, 794–797.
Google Scholar
Cai, T., Hyndman, R.J. and Wand, M.P. (2002). Mixed model-based hazard estimation. Journal of Computational and Graphical Statistics, 11, in press.
Article MathSciNet Google Scholar
Carroll, R. J., Ruppert, D. and Stefanski, L.A. (1995). Measurement Error in Nonlinear Models. London: Chapman and Hall.
Book MATH Google Scholar
Casella, G. and Berger, R. L. (1990). Statistical Inference (Second Edition). Pacific Grove, California: Thomson Learning.
MATH Google Scholar
Chaudhuri, P. and Marron, J.S. (1999). SiZer for exploration of structures in curves. Journal of the American Statistical Association, 94, 807–823.
Article MathSciNet MATH Google Scholar
Chen, Z. (1993). Fitting multivariate regression functions by interaction spline models. Journal of the Royal Statistics Society, Series B, 55, 473–491.
MathSciNet MATH Google Scholar
Coull, B.A., Ruppert, D. and Wand, M.P. (2001). Simple incorporation of interactions into additive models. Biometrics, 57, 539–545.
Article MathSciNet MATH Google Scholar
Cressie, N. (1993). Statistics for Spatial Data. New York: John Wiley & Sons.
Book MATH Google Scholar
Diggle, P., Liang, K.-L. and Zeger, S. (1995). Analysis of Longitudinal Data. Oxford: Oxford University Press.
MATH Google Scholar
Diggle, P. (1997). Spatial and longitudinal data analysis: Two histories with a common future? In Proceedings of the Nantucket conference on Modeling Longitudinal and Spatially Correlated Data: Methods, Applications, and Future Directions. Lecture Notes in Statistics 122, Gregoire, T., Brillinger, D.R., Diggle, P.J., Rusek-Cohen, E., Warren, W.G., Wolfinger, R.D. (eds), Springer-Verlag, New York, 387–402.
Chapter Google Scholar
Draper, N.R. and Smith, H. (1998). Applied Regression Analysis (Third Edition). New York: John Wiley & Sons.
Book MATH Google Scholar
Eilers, P.H.C. and Marx, B.D. (1996). Flexible smoothing with B-splines and penalties (with discussion). Statistical Science, 11, 89–121.
Article MathSciNet MATH Google Scholar
French, J.L., Kammann, E.E. and Wand, M.P. (2001). Comment on Ke and Wang. Journal of the American Statistical Association, 96, 1285–1288.
Google Scholar
French, J.L. and Wand, M.P. (2002). Generalized additive models for cancer mapping with incomplete covariates. Bio statistics, to appear.
Fuller, W.A. (1987). Measurement Error Models. New York: John Wiley & Sons.
Book MATH Google Scholar
Fung, W.-K., Zhu, Z.-Y., Wei, B.-C. and He, X. (2002). Influence diagnostics and outlier tests for semiparametric mixed models. Journal of the Royal Statistical Society, Series B. 64, 565–579.
Article MathSciNet MATH Google Scholar
Ganguli, B., Staudenmayer, J. and Wand, M.P. (2002). Additive models with predictors subject to measurement error. Unpublished manuscript.
Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B. (1995). Bayesian Data Analysis. Boca Raton, Florida: Chapman and Hall.
Book MATH Google Scholar
Gilks, W.R., Richardson, S. and Spiegelhalter, D.J. (1996). Markov Chain Monte Carlo in Practice. London: Chapman and Hall.
MATH Google Scholar
Gray, R. J. (1992). Spline-based tests in survival analysis. Biometrics, 50, 640–652.
Article MathSciNet MATH Google Scholar
Green, P.J. (1985). Linear models for field trials, smoothing and cross-validation. Biometrika, 72, 523–537.
Article MathSciNet Google Scholar
Green, P.J. (1987), Penalized likelihood for general semi-parametric regression models. International Statistical Review, 55, 245–259.
Article MathSciNet MATH Google Scholar
Hastie, T.J. (1996). Pseudosplines. Journal of the Royal Statistical Society, Series B, 58, 379–396.
MathSciNet MATH Google Scholar
Hastie, T.J. and Tibshirani, R.J. (1990). Generalized Additive Models. London: Chapman and Hall.
MATH Google Scholar
Hastie, T.J. and Tibshirani, R.J. (1993). Varying-coefficients models. Journal of the Royal Statistics Society, Series B, 55, 757–796.
MathSciNet MATH Google Scholar
Hastie, T. and Tibshirani, R.J. (2000). Bayesian backfitting. Statistical Science, 15, 196–223.
Article MathSciNet MATH Google Scholar
Huber, P. (1983). Robust estimation of a location parameter. The Annals of Mathematical Statistics, 35, 73–101.
Article MathSciNet MATH Google Scholar
Ibrahim, J.G. (1990). Incomplete data Journal of the American Statistical Association, 85, 765–769.
Article Google Scholar
Ibrahim, J.G., Chen, M.H., and Lipsitz, S.R. (2001). Missing responses in generalized linear mixed models when the missing data mechanism is nonignorable. Biometrika, 88, 551–564.
Article MathSciNet MATH Google Scholar
James, G.M. and Hastie, T.J. (2001). Functional linear discriminant analysis for irregularly sampled curves. Journal of the Royal Statistical Society, Series B, 63, 533–550.
Article MathSciNet MATH Google Scholar
James, G.M., Hastie, T.J. and Sugar, C.A. (2000). Principal component models for sparse functional data. Biometrika, 87, 587–602.
Article MathSciNet MATH Google Scholar
Johnson, M.E., Moore, L.M. and Ylvisaker, D. (1990). Minimax and maximin distance designs. Journal of Statistical Planning and Inference, 26, 131–148.
Article MathSciNet Google Scholar
Kammann, E.E. and Wand, M.P. (2002). Geoadditive models. Applied Statistics, 52, 1–18.
MathSciNet MATH Google Scholar
Kammann, E.E., Staudenmayer, J. and Wand, M.P. (2002). Robustness for general design mixed models using the t-distribution. Unpublished manuscript.
Ke, C. and Wang, Y. (2001). Semiparametric nonlinear mixed-effects models and their applications. Journal of the American Statistical Association, 96, 1272–1281.
Article MathSciNet MATH Google Scholar
Kelly, C. and Rice, J. (1990). Monotone smoothing with application to dose-response curves and the assessment of synergism. Biometrics, 46, 1071–1085.
Article Google Scholar
Laird, N. M. and Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics, 38, 963–974.
Article MATH Google Scholar
Lange, K.L., Little, R.J.A. and Taylor, J.M.G. (1989). Robust statistical modeling using the t-distribution. Journal of the American Statistical Association, 84, 881–896.
MathSciNet Google Scholar
Lin, X. and Zhang, D. (1999). Inference in generalized additive mixed models by using smoothing splines. Journal of the Royal Statistical Society, Series B, 61, 381–400.
Article MathSciNet MATH Google Scholar
Little, R.J. and Rubin, D.B. (1987). Statistical Analysis with Missing Data. New York: John Wiley & Sons.
MATH Google Scholar
MathSoft Inc. (2002).
McCulloch, C.E., and Searle, S.R. (2000). Generalized, Linear, and Mixed Models. New York: John Wiley & Sons.
Book MATH Google Scholar
Ngo, L. and Wand, M.P. (2002). Smoothing with mixed model software. Submitted.
Nychka, D.W. (2000). Spatial process estimates as smoothers. In Smoothing and Regression (M. Schimek, ed.). Heidelberg: Springer-Verlag.
Google Scholar
Nychka, D. and Saltzman, N. (1998). Design of Air Quality Monitoring Networks. In Case Studies in Environmental Statistics Nychka (D. Nychka, Cox, L., Piegorsch, W. eds.), Lecture Notes in Statistics, Springer-Verlag, 51–76.
Nychka, D., Haaland, P., O’Connell, M., Ellner, S. (1998). FUNFITS, data analysis and statistical tools for estimating functions. In Case Studies in Environmental Statistics (D. Nychka, W.W. Piegorsch, L.H. Cox, eds.), New York: Springer-Verlag, 159–179.
Chapter Google Scholar
O’Connell, M.A. and Wolfinger, R.D. (1997). Spatial regression models, response surfaces, and process optimization. Journal of Computational and Graphical Statistics, 6, 224–241.
MATH Google Scholar
O’Sullivan, F. (1986). A statistical perspective on ill-posed inverse problems (with discussion). Statistical Science, 1, 505–527.
MATH Google Scholar
O’Sullivan, F. (1988). Fast computation of fully automated log-density and log-hazard estimators. SIAM Journal on Scientific and Statistical Computing, 9, 363–379.
Article MathSciNet MATH Google Scholar
Parker, R.L. and Rice, J.A. (1985). Discussion of “Some aspects of the spline smoothing approach to nonparametric curve fitting” by B.W. Silverman. Journal of the Royal Statistical Society, Series B, 47, 40–42.
Google Scholar
Patterson, H.D. and Thompson, R. (1973). Recovery of inter-block information when block sizes are unequal. Biometrika, 58, 545–554.
Article MathSciNet MATH Google Scholar
Pinheiro, J.C. and Bates, D.M. (2000). Mixed-Effects Models in S and S-PLUS. New York: Springer.
Book MATH Google Scholar
Robinson, G.K. (1991). That BLUP is a good thing: the estimation of random effects. Statistical Science, 6, 15–51.
Article MathSciNet MATH Google Scholar
Rousseeuw, P. J. and Leroy, A. M. (1987). Robust Regression and Outlier Detection. New York: John Wiley & Sons.
Book MATH Google Scholar
Ruppert, D. (2002). Selecting the number of knots for penalized splines. Journal of Computational and Graphical Statistics, in press.
Ruppert, D. and Carroll, R.J. (2000). Spatially-adaptive penalties for spline fitting. Australian and New Zealand Journal of Statistics, 42, 205–224.
Article Google Scholar
Ruppert, D., Wand, M. P. and Carroll, R.J. (2003). Semiparametric Regression. New York: Cambridge University Press.
Book MATH Google Scholar
SAS Institute, Inc. (2002).
Schafer, J. L. (1997). Analysis of Incomplete Multivariate Data. New York: Chapman and Hall.
Book MATH Google Scholar
Searle, S.R., Casella, G. and McCulloch, C.E. (1992). Variance Components. New York: John Wiley & Sons.
Book MATH Google Scholar
Shively, T.S., Kohn, R. and Wood, S. (1999). Variable selection and function estimation in additive nonparametric regression using a data-based prior. Journal of the American Statistical Association, 94, 777–794.
Article MathSciNet MATH Google Scholar
Speed, T. (1991). Comment on paper by Robinson. Statistical Science, 6, 42–44.
Article Google Scholar
Stein, M.L. (1999). Interpolation of Spatial Data: Some Theory for Kriging. New York: Springer.
Book MATH Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, Methodological, 58, 267–288.
MathSciNet MATH Google Scholar
Verbyla, A.P. (1994). Testing linearity in generalized linear models. Contributed Pap. 17th Int. Biometric Conf., Hamilton, Aug. 8th-12th, 177.
Verbyla, A.P., Cullis, B.R., Kenward, M.G. and Welham, S.J. (1999). The analysis of designed experiments and longitudinal data by using smoothing splines (with discussion). Journal of the Royal Statistics Society, Series C, 48, 269–312.
Article MATH Google Scholar
Wahba, G. (1978). Improper priors, spline smoothing and the problem of guarding against model errors in regression. Journal of the Royal Statistical Society, Series B, 40, 364–372.
MathSciNet MATH Google Scholar
Wahba, G. (1986). Partial interaction spline models for the semiparametric estimation of functions of several variables. Computer Science and Statistics: Proceedings of the 18th Symposium on the Interface, 75–80.
Wahba, G. (1990). Spline Models for Observational Data. Philadelphia: SIAM.
Book MATH Google Scholar
Wang, Y. (1998a). Smoothing spline models with correlated random errors. Journal of the American Statistical Association, 93, 341–348.
Article MATH Google Scholar
Wang, Y. (1998b). Mixed effects smoothing spline analysis of variance. Journal of the Royal Statistical Society, Series B, 60, 159–174.
Article MathSciNet MATH Google Scholar
Wecker, W.E. and Ansley, C.F. (1983). The signal extraction approach to nonlinear regression and spline smoothing. Journal of the American Statistical Association, 78, 81–89.
Article MathSciNet MATH Google Scholar
Welsh, A.H. and Richardson, A.M. (1997). Approaches to the robust estimation of mixed models. In Handbook of Statistics, Vol. 15 (G. S. Maddala and C.R. Rao eds.), Amsterdam: Elsevier Science.
Google Scholar

Download references

Acknowledgements

The ideas summarised in this article are the result of interaction with several of my colleagues at Harvard School of Public Health in the period 1997–2002: Babette Brumback, Tianxi Cai, Brent Coull, Jonathan French, Bhaswati Ganguli, Erin Kammann, Long Ngo, Nan Laird, Helen Parise, Louise Ryan, Misha Salganik, Joel Schwartz, John Staudenmayer, Sally Thurston, Jim Ware and Yihua Zhao. The paper has also benefited greatly from conversations with Marc Aerts, Ray Carroll, Gerda Claeskens, Ciprian Crainiceanu, Maria Durban, Jim Hobert, Robert Kohn, Xihong Lin, Mary Lindstrom, Michael O’ Connell, José Pinheiro and David Ruppert. I am grateful to Professors Trevor Hastie and Gareth James for making the spinal bone mineral density data available. Finally, thank you to participants in the Euroworkshop on Nonparametric Models (HPCFCT-2000-00041) held in Bernried, Germany in November, 2001 and for its co-organiser, Göran Kauermann, for encouraging me to write this paper. This paper was supported by U.S. National Institute of Environmental Health Sciences grant R01-ES10844-01.

Author information

Authors and Affiliations

Department of Biostatistics, School of Public Health, Harvard University, 665 Huntington Avenue, Boston, MA, 02115, USA
M. P. Wand

Authors

M. P. Wand
View author publications
You can also search for this author in PubMed Google Scholar

Appendix: Demmler-Reinsch orthogonalisation

If X and Z contain the fixed and random effect basis functions for a scatterplot smooth (e.g. as in Section 3.2 or Section 3.6.1) and, as shown in Section 3.2.1, penalised spline regression corresponds to the ridge regression

$${\widehat {\bf{f}}_\alpha } = {\bf{C}}{\left( {{{\bf{C}}^{\bf\top }}{\bf{C}} + \alpha {\bf{D}}} \right)^{ - 1}}{{\bf{C}}^{\bf{ \top} }}{\bf{y}}$$

(9.1)

for some diagonal matrix D and with C = [X Z]. Here α controls the amount of smoothing and in the mixed model formulation of penalised splines $\alpha = \sigma _\varepsilon ^2/\sigma _u^2$. Algorithm 1 allows for fast and stable calculation of (9.1).

The Cholesky decomposition applies only to nonsingular matrices. If C is ill-conditioned, it is advisable to add a small multiple of D to C^TC before applying the Cholesky decomposition, so that

$${{\bf{C}}^{\bf{ \top} }}{\bf{C}}\; + \;\delta {\bf{D}} = {{\bf{R}}^{\bf \top} }{\bf{R}},$$

where δ is small, e.g., δ = 10⁻¹⁰.

Once the matrix A and vectors b and s have been computed, the vector of fitted for different values of α reduces to a matrix multiplication. Therefore, ${\widehat {\bf{f}}_\alpha }$ can be computed cheaply for several α values. This is particularly useful for automatic smoothing parameter selection.

Justification of Algorithm 1

Now

$${{\bf{R}}^{ - \top }}{\bf{D}}{{\bf{R}}^{ - 1}} = {\bf{U}}{\mathop{\rm diag}\nolimits} ({\bf{s}}){{\bf{U}}^ \top }\quad {\rm{ with }}\quad {{\bf{U}}^ \top }{\bf{U}} = {\bf{I}}.$$

Since U is a square matrix, U^T = U⁻¹ and so

$${\bf{D}}\; = \;{{\bf{R}}^ \top }{\bf{U}}{\mathop{\rm diag}\nolimits} ({\bf{s}}){{\bf{U}}^{ - 1}}\;{\bf{R}}.$$

Also,

$${{\bf{C}}^ \top }{\bf{C}}\; = \;{{\bf{R}}^ \top }{\bf{R}} = {{\bf{R}}^ \top }{\bf{U}}{{\bf{U}}^{ - 1}}{\bf{R}}$$

and consequently

$${{\bf{C}}^ \top }{\bf{C}}\; = \,\alpha {\bf{D}} = \;{{\bf{R}}^ \top }{\bf{U}}\{ {\bf{I}}\; + \;\alpha {\rm{diag}}\left( {\bf{s}} \right){{\bf{U}}^{ - 1}}{\bf{R}}.$$

Hence

$$\begin{array}{*{20}{c}} {{{\hat f}_\alpha }}& = &{C{{[{R^T}U\{ I + \alpha \text{diag}(s)\} {U^{ - 1}}R]}^{ - 1}}{C^T}y\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;} \\ \;& = &{(C{R^{ - 1}}U){{\{ \text{diag}(1 + \alpha s)\} }^{ - 1}}{{(C{R^{ - 1}}U)}^T}y = A\left( {\frac{b}{{1 + \alpha s}}} \right)} \end{array}$$

where A ≡ CR⁻¹U and b ≡ A^Ty.

The new expression for ${\widehat {\bf{f}}_\alpha }$ is thus of the form

$${\widehat {\bf{f}}_\alpha } = {\bf{A}}{\left\{ {{{\bf{A}}^ \top }{\bf{A}}\; + \;\alpha {\rm{diag}}\left( {\bf{s}} \right)} \right\}^{ - 1}}{{\bf{A}}^ \top }{\bf{y}}.$$

Comparison with (9.1) shows that we have effectively replaced the basis functions in C with those in A where this design matrix has the orthogonality property A^TA = I. The columns of A correspond to the Demmler-Reinsch basis for the vector space spanned by C. The orthogonality property is crucial for the fast computation over several smoothing parameters.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wand, M.P. Smoothing and mixed models. Computational Statistics 18, 223–249 (2003). https://doi.org/10.1007/s001800300142

Download citation

Published: 04 November 2019
Issue Date: July 2003
DOI: https://doi.org/10.1007/s001800300142

Summary

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

On the estimation of variance parameters in non-standard generalised linear mixed models: application to penalised smoothing

Inverse prediction for multivariate mixed models with standard software

Post-Model-Selection Prediction Intervals for Generalized Linear Models

References

Acknowledgements

Author information

Authors and Affiliations

Appendix: Demmler-Reinsch orthogonalisation

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Smoothing and mixed models

Summary

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

On the estimation of variance parameters in non-standard generalised linear mixed models: application to penalised smoothing

Inverse prediction for multivariate mixed models with standard software

Post-Model-Selection Prediction Intervals for Generalized Linear Models

References

Acknowledgements

Author information

Authors and Affiliations

Appendix: Demmler-Reinsch orthogonalisation

Appendix: Demmler-Reinsch orthogonalisation

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation