×

Prediction and classification when the diagnostic classes are related. (English) Zbl 0900.62593

Summary: We consider prediction and classification into diagnostic classes which consist of individuals who can suffer from multiple diseases. For instance, in a cardiovascular context a patient can need bypass surgery, or a valve replacement, or both. The popular multigroup logistic model is suitable for prediction into nominal classes, but does not employ the underlying structure of the classes. Hence, this model is not entirely suitable for this situation. Also, computational difficulties often occur with the multigroup logistic model when the classes are of the above nature. A modified form of the model, applicable to some economic applications, is not appropriate for most medical applications. Instead, we suggest the n-way Dale model, also called the marginal logistic model. It is shown that this model is computationally more stable, although more involved, and allows better interpretation of the parameters. To illustrate our ideas the POPS data set is taken, where the child’s abilities at the age of 2 is predicted from risk factors at delivery. A simulation study is performed to indicate the gain in classification ability in comparison with the multigroup logistic model. It is also shown that in terms of the parameter estimates the Dale model is more sensitive to the choice of the sampling scheme than the multigroup logistic model.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
65C99 Probabilistic methods, stochastic differential equations

Software:

APL
Full Text: DOI

References:

[1] Agresti, A.: Categorical data analysis. (1990) · Zbl 0716.62001
[2] Anderson, J. A.: Separate sample logistic discrimination. Biometrika 59, 19-35 (1972) · Zbl 0231.62080
[3] Albert, A.; Anderson, J. A.: On the existence of maximum likelihood estimates in logistic regression models. Biometrika 71, 1-10 (1984) · Zbl 0543.62020
[4] Albert, A.; Lesaffre, E.: Multiple group logistic discrimination. Comput. math. Appl. 12, 209-224 (1986) · Zbl 0657.62071
[5] Anscombe, F. J.: Computing in statistical science through APL. (1981) · Zbl 0479.62002
[6] Bahadur, R. R.: A representation of the joint distribution of responses to n dichotomous items. Stanford mathematical studies in the social sciences VI (1961) · Zbl 0103.36701
[7] Begg, C. B.; Gray, R.: Calculations of polychotomous logistic regression estimation using individualized regressions. Biometrika 71, 11-18 (1986) · Zbl 0533.62089
[8] Cox, D. R.: Some procedures connected with the logistic qualitative response curve. Research papers in statistics: essays in honour of J. Neyman’s 70th birthday, 55-71 (1966) · Zbl 0158.17808
[9] Cox, D. R.: The analysis of multivariate binary data. Appl. statist. 21, 113-120 (1972)
[10] Cramer, J. S.; Ridder, G.: Pooling states in the multinomial logit model. J. econometrics 47, 267-272 (1991) · Zbl 1359.62314
[11] Dale, J. R.: Global cross-ratio models for bivariate, discrete, ordered responses. Biometrics 42, 909-917 (1986)
[12] Fitzmaurice, G. M.; Laird, N. M.: A likelihood-based method for analyzing longitudinal binary responses. Biometrika 80, 141-151 (1993) · Zbl 0775.62296
[13] Fitzmaurice, G. M.; Laird, N. M.; Rotnizky, A.: Regression models for discrete longitudinal responses. Statist. sci. 8, 284-309 (1993) · Zbl 0955.62614
[14] Fréchet, M.: Sur LES tableaux de corrélation dont LES marges sont données. Ann. université Lyon section A ser. 3 14, 53-77 (1951) · Zbl 0045.22905
[15] Greenwood, C.; Farewell, V.: A comparison of regression models for ordinal data in an analysis of transplanted kidney function. Canadian J. Statist. 16, 325-335 (1989)
[16] Jr., W. J. Kennedy; Gentle, J. E.: Statistical computing. (1980) · Zbl 0435.62003
[17] Lesaffre, E.; Albert, A.: Partial separation in logistic discrimination. J. roy. Statist. soc. Ser. B 51, 109-116 (1989) · Zbl 0669.62044
[18] Lesaffre, E.; Molenberghs, G.: Multivariate probit analysis: a neglected procedure in medical statistics. Statist. med. 10, 1391-1403 (1991)
[19] Lesaffre, E.; Molenberghs, G.; Verbeke, G.: A sensitivity analysis of two bivariate response models. Comput. statist. Data anal. 17, 363-391 (1994) · Zbl 0937.62646
[20] Liang, K. -Y.; Zeger, S. L.: Longitudinal data analysis using generalized linear models. Biometrika 73, 13-22 (1986) · Zbl 0595.62110
[21] Mardia, K. V.: Families of bivariate distributions. (1970) · Zbl 0223.62062
[22] Mccullagh, P.; Nelder, J. A.: Generalized linear models. (1989) · Zbl 0744.62098
[23] Molenberghs, G.; Lesaffre, E.: Marginal modeling of correlated ordinal data using a multivariate plackett distribution. J. amer. Statist. assoc. 89, 633-644 (1994) · Zbl 0802.62063
[24] Neuhaus, J. M.; Jewell, P. J.: The effect of retrospective sampling on binary regression models for clustered data. Biometrics 46, 977-990 (1990)
[25] Palmgren, J.: Regression models for bivariate responses. Tech. report 101 (1989)
[26] Plackett, R. L.: A class of bivariate distributions. J. amer. Statist. assoc. 60, 516-522 (1965)
[27] Prentice, R. L.; Pyke, R.: Logistic disease incidence models and case-control studies. Biometrika 66, 403-411 (1979) · Zbl 0428.62078
[28] Schmidt, P.; Strauss, R. P.: Estimation of models with jointly dependent qualitative variables: a simultaneous logit approach. Econometrica 43, 745-755 (1975) · Zbl 0322.62108
[29] Verloove-Vanhorick, S. P.; Verwey, R. A.; Brand, R.; Gravenhorst, J. Bennebroek; Keirse, M. J. N.C.; Ruys, J. H.: Neonatal mortality risk in relation to gestational age and birthweight. Lancet i, 55-57 (1986)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.