Abstract
There are several procedures for fitting generalized additive models, i.e. regression models for an exponential family response where the influence of each single covariates is assumed to have unknown, potentially non-linear shape. Simulated data are used to compare a smoothing parameter optimization approach for selection of smoothness and of covariates, a stepwise approach, a mixed model approach, and a procedure based on boosting techniques. In particular it is investigated how the performance of procedures is linked to amount of information, type of response, total number of covariates, number of influential covariates, and extent of non-linearity. Measures for comparison are prediction performance, identification of influential covariates, and smoothness of fitted functions. One result is that the mixed model approach returns sparse fits with frequently over-smoothed functions, while the functions are less smooth for the boosting approach and variable selection is less strict. The other approaches are in between with respect to these measures. The boosting procedure is seen to perform very well when little information is available and/or when a large number of covariates is to be investigated. It is somewhat surprising that in scenarios with low information the fitting of a linear model, even with stepwise variable selection, has not much advantage over the fitting of an additive model when the true underlying structure is linear. In cases with more information the prediction performance of all procedures is very similar. So, in difficult data situations the boosting approach can be recommended, in others the procedures can be chosen conditional on the aim of the analysis.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Binder, H., Tutz, G.: Fitting generalized additive models: a comparison of methods. FDM-Preprint 93, University of Freiburg (2006)
Breiman, L.: Prediction games and arcing algorithms. Neural Comput. 11, 1493–1517 (1999)
Bühlmann, P., Yu, B.: Boosting with the L2 loss: regression and classification. J. Am. Stat. Assoc. 98, 324–339 (2003)
Chambers, J.M., Hastie, T.J.: Statistical Models in S. Wadsworth, Pacific Grove (1992)
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)
Friedman, J.H., Stuetzle, W.: Projection pursuit regression. J. Am. Stat. Assoc. 76, 817–823 (1981)
Friedman, J., Hastie, T., Rosset, S., Tibshirani, R., Zhu, J.: Statistical behavior and consistency of classification methods based on convex risk minimization: discussion of the paper by T. Zhang. Ann. Stat. 32(1), 102–107 (2004)
Green, P.J., Silverman, B.W.: Nonparametric Regression and Generalized Linear Models. Chapman & Hall, London (1994)
Hand, D.J.: Classifier technology and the illusion of progress. Stat. Sci. 21(1), 1–14 (2006)
Hastie, T., Tibshirani, R.: Generalized additive models. Stat. Sci. 1, 295–318 (1986)
Hastie, T.J., Tibshirani, R.J.: Generalized Additive Models. Chapman & Hall, London (1990)
Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970)
Hurvich, C.M., Simonoff, J.S., Tsai, C.: Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion. J. R. Stat. Soc. B 60(2), 271–293 (1998)
Kim, Y.-J., Gu, C.: Smoothing spline Gaussian regression: more scalable computation via efficient approximation. J. R. Stat. Soc. B 66(2), 337–356 (2004)
Lee, T.C.M.: Smoothing parameter selection for smoothing splines: a simulation study. Comput. Stat. Data Anal. 42, 139–148 (2003)
Lindstrom, M.J.: Penlized estimation of free-knot splines. J. Comput. Graph. Stat. 8(2), 333–352 (1999)
Linton, O.B., Härdle, W.: Estimation of additive regression models with known links. Biometrika 83, 529–540 (1996)
Marx, B.D., Eilers, P.H.C.: Direct generalized additive modelling with penalized likelihood. Comput. Stat. Data Anal. 28, 193–209 (1998)
McCullagh, P., Nelder, J.A.: Generalized Linear Models, 2nd edn. Chapman & Hall, London (1989)
R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2006). ISBN 3-900051-07-0
Ruppert, D.: Selecting the number of knots for penalized splines. J. Comput. Graph. Stat. 11, 735–757 (2002)
Ruppert, D., Wand, M.P., Carroll, R.J.: Semiparametric Regression. Cambridge University Press, Cambridge (2003)
Speed, T.: Comment on “That BLUP is a good thing: the estimation of random effects” by G.K. Robinson. Stat. Sci. 6(1), 42–44 (1991)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58(1), 267–288 (1996)
Tutz, G., Binder, H.: Generalized additive modelling with implicit variable selection by likelihood based boosting. Biometrics 62, 961–971 (2006)
Wand, M.P.: A comparison of regression spline smoothing procedures. Comput. Stat. 15, 443–462 (2000)
Wang, Y.: Mixed effects smoothing spline analysis of variance. J. R. Stat. Soc. B 60(1), 159–174 (1998)
Wood, S.N.: Modelling and smoothing parameter estimation with multiple quadratic penalties. J. R. Stat. Soc. B 62(2), 413–428 (2000)
Wood, S.N.: Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Am. Stat. Assoc. 99(467), 673–686 (2004)
Wood, S.N.: Generalized Additive Models. An Introduction with R. Chapman & Hall/CRC, Boca Raton (2006)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Binder, H., Tutz, G. A comparison of methods for the fitting of generalized additive models. Stat Comput 18, 87–99 (2008). https://doi.org/10.1007/s11222-007-9040-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-007-9040-0