×

Bayesian spatial homogeneity pursuit for survival data with an application to the SEER respiratory cancer data. (English) Zbl 1520.62209

Summary: In this work, we propose a new Bayesian spatial homogeneity pursuit method for survival data under the proportional hazards model to detect spatially clustered patterns in baseline hazard and regression coefficients. Specially, regression coefficients and baseline hazard are assumed to have spatial homogeneity pattern over space. To capture such homogeneity, we develop a geographically weighted Chinese restaurant process prior to simultaneously estimating coefficients and baseline hazards and their uncertainty measures. An efficient Markov chain Monte Carlo (MCMC) algorithm is designed for our proposed methods. Performance is evaluated using simulated data, and further applied to a real data analysis of respiratory cancer in the state of Louisiana.
{© 2021 The International Biometric Society.}

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis

Software:

spBayes; fossil

References:

[1] Banerjee, S., Carlin, B.P., & Gelfand, A.E. (2014) Hierarchical Modeling and Analysis for Spatial Data. Boca Raton, FL: CRC Press.
[2] Banerjee, S., & Dey, D.K. (2005) Semiparametric proportional odds models for spatially correlated survival data. Lifetime Data Analysis, 11, 175-191. · Zbl 1080.62085
[3] Banerjee, S., Wall, M.M., & Carlin, B.P. (2003) Frailty modeling for spatially correlated survival data, with application to infant mortality in Minnesota. Biostatistics, 4, 123-142. · Zbl 1142.62420
[4] Bhatt, V., & Tiwari, N. (2014) A spatial scan statistic for survival data based on Weibull distribution. Statistics in Medicine, 33, 1867-1876.
[5] Blackwell, D., MacQueen, J.B., et al. (1973) Ferguson distributions via Pólya urn schemes. The Annals of Statistics, 1, 353-355. · Zbl 0276.62010
[6] Blei, D.M., & Frazier, P.I. (2011) Distance dependent Chinese restaurant processes. Journal of Machine Learning Research, 12, 2461-2488. · Zbl 1280.68157
[7] Chen, M.‐H., Qi‐Man, S., & G, I.J. (2000). Monte Carlo Methods in Bayesian Computation. New York: Springer‐Verlag. · Zbl 0949.65005
[8] Cox, D.R. (1972) Regression models and life‐tables. Journal of the Royal Statistical Society B, 34, 187-220. · Zbl 0243.62041
[9] Dahl, D.B. (2006) Model‐based clustering for expression data via a Dirichlet process mixture model. In: Do, K.‐A. (ed.), Müller, P. (ed.) and Vannucci, M. (ed.) (Eds.) Bayesian Inference for Gene Expression and Proteomics, Vol. 4. Cambridge: Cambridge University Press, pp. 201-218. · Zbl 1182.62050
[10] Friedman, M. (1982) Piecewise exponential models for survival data with covariates. The Annals of Statistics, 10, 101-113. · Zbl 0483.62086
[11] Geisser, S. (1993). Predictive Inference: An Introduction. London: Chapman & Hall. · Zbl 0824.62001
[12] Gelfand, A.E., & Dey, D.K. (1994) Bayesian model choice: asymptotics and exact calculations. Journal of the Royal Statistical Society: Series B (Methodological), 56, 501-514. · Zbl 0800.62170
[13] Gelfand, A.E., Dey, D.K., & Chang, H. (1992) Model determination using predictive distributions with implementation via sampling‐based‐methods. STANFORD UNIV CA DEPT OF STATISTICS. In Bayesian Statistics, Vol. 4. University Press.
[14] Gelfand, A.E., Kim, H.‐J., Sirmans, C., & Banerjee, S. (2003) Spatial modeling with spatially varying coefficient processes. Journal of the American Statistical Association, 98, 387-396. · Zbl 1041.62041
[15] Green, P.J. (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 711-732. · Zbl 0861.62023
[16] Griffiths, T.L., Jordan, M.I., Tenenbaum, J.B., & Blei, D.M. (2004) Hierarchical topic models and the nested Chinese restaurant process. Advances in Neural Information Processing Systems, 16, 17-24.
[17] Henderson, R., Shimakura, S., & Gorst, D. (Dec., 2002) Modeling spatial variation in leukemia survival data. Journal of the American Statistical Association, 97, 965-972. · Zbl 1048.62102
[18] Hu, G., Geng, J., Xue, Y., & Sang, H. (2020) Bayesian spatial homogeneity pursuit of functional data: an application to the U.S. income distribution. arXiv preprint arXiv:2002.06663.
[19] Hu, G., & Huffer, F. (2020) Modified Kaplan-Meier estimator and Nelson-Aalen estimator with geographical weighting for survival data. Geographical Analysis, 52, 28-48.
[20] Hu, G., Xue, Y., & Huffer, F. (2020) A comparison of Bayesian accelerated failure time models with spatially varying coefficients. Sankhya B, 1-17.
[21] Huang, L., Pickle, L.W., Stinchcomb, D., & Feuer, E.J. (2007) Detection of spatial clusters: application to cancer survival as a continuous outcome. Epidemiology, 18, 73-87.
[22] Ibrahim, J.G., Chen, M.‐H., & Sinha, D. (2001). Bayesian Survival Analysis. New York: Springer‐Verlag. · Zbl 0978.62091
[23] Lee, J., Gangnon, R.E., & Zhu, J. (2017) Cluster detection of spatial regression coefficients. Statistics in Medicine, 36, 1118-1133.
[24] Lee, J., Kamenetsky, M.E., Gangnon, R.E., & Zhu, J. (2020) Clustered spatio‐temporal varying coefficient regression model. Statistics in Medicine, 40, 465-480.
[25] Li, F., & Sang, H. (2019) Spatial homogeneity pursuit of regression coefficients for large datasets. Journal of the American Statistical Association, 114, 1050-1062. · Zbl 1428.62212
[26] Lu, J., Li, M., & Dunson, D. (2018) Reducing over‐clustering via the powered Chinese restaurant process. arXiv preprint arXiv:1802.05392.
[27] Ma, Z., Xue, Y., & Hu, G. (2020) Heterogeneous regression models for clusters of spatial dependent data. Spatial Economic Analysis, 15, 459-475.
[28] Miller, J.W., & Harrison, M.T. (2013) A simple example of Dirichlet process mixture inconsistency for the number of components. Advances in Neural Information Processing Systems, 26, 199-206.
[29] Mu, J., Liu, Q., Kuo, L., & Hu, G. (2020) Bayesian variable selection for Cox regression model with spatially varying coefficients with applications to Louisiana respiratory cancer data. arXiv preprint arXiv:2008.00615 .
[30] Neal, R.M. (2000) Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9, 249-265.
[31] Pitman, J. (1995) Exchangeable and partially exchangeable random partitions. Probability Theory and Related Fields, 102, 145-158. · Zbl 0821.60047
[32] Rand, W.M. (1971) Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66, 846-850.
[33] SEER, P. (2016) Public‐use data (1973‐2015). National cancer institute, DCCPS, surveillance research program, cancer statistics branch, released April 2016, based on the November 2015 submission.
[34] Tobler, W.R. (1970) A computer movie simulating urban growth in the Detroit region. Economic Geography, 46, 234-240.
[35] Vavrek, M.J. (2011) Fossil: palaeoecological and palaeogeographical analysis tools. Palaeontologia Electronica, 14, 16.
[36] Xue, Y., Schifano, E.D., & Hu, G. (2020) Geographically weighted Cox regression for prostate cancer survival data in Louisiana. Geographical Analysis, 52, 570-587.
[37] Zhang, J., & Lawson, A.B. (2011) Bayesian parametric accelerated failure time spatial model and its application to prostate cancer. Journal of Applied Statistics, 38, 591-603. · Zbl 1511.62389
[38] Zhao, P., Yang, H.‐C., Dey, D.K., & Hu, G. (2020) Bayesian spatial homogeneity pursuit regression for count value data. arXiv preprint arXiv:2002.06678 .
[39] Zhou, H., Lawson, A.B., Hebert, J.R., Slate, E.H., & Hill, E.G. (2008) Joint spatial survival modeling for the age at diagnosis and the vital outcome of prostate cancer. Statistics in Medicine, 27, 3612-3628.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.