×

Joint analysis of semicontinuous data with latent variables. (English) Zbl 07345927

Summary: A two-part latent variable model is proposed to analyze semicontinuous data in the presence of latent variables. The proposed model comprises two major components. The first component is a structural equation model (SEM), which characterizes latent variables using corresponding multiple attributes and examines the interrelationships among them. The second component is a two-part model to assess a semicontinuous response of interest. The semicontinuous variable is characterized by a mixture of zero values and continuously distributed positive values. The two-part model manages this semicontinuous variable by splitting it into two random variables; one is a binary indicator to determine whether the response is zero, another is a continuous variable to determine the actual level of the positive response. A full Bayesian approach coupled with spike-and-slab lasso prior is developed for simultaneous variable selection and parameter estimation. The proposed methodology is demonstrated by a simulation study and applied to the analysis of the Chinese General Social Survey dataset. New insights into the interrelationships among non-cognitive ability, education level, and annual income are obtained.

MSC:

62J07 Ridge regression; shrinkage estimators (Lasso)
62F15 Bayesian inference
62J05 Linear regression; mixed models
62-08 Computational methods for problems pertaining to statistics
62P25 Applications of statistics to social sciences
Full Text: DOI

References:

[1] Barbieri, M. M.; Berger, J. O., Optimal predictive model selection, Ann. Statist., 32, 3, 870-897 (2004) · Zbl 1092.62033
[2] Black, S. E.; Devereux, P. J.; Salvanes, K. G., The more the merrier? The effect of family size and birth order on children’s education, Q. J. Econ., 120, 2, 669-700 (2005)
[3] Bollen, K. A., Structural Equations with Latent Variables (1989), John Wiley & Sons: John Wiley & Sons New York · Zbl 0731.62159
[4] Borghans, L.; Duckworth, A. L.; Heckman, J. J.; Ter Weel, B., The economics and psychology of personality traits, J. Hum. Resour., 43, 4, 972-1059 (2008)
[5] Davis-Kean, P. E., The influence of parent education and family income on child achievement: the indirect role of parental expectations and the home environment, J. Family Psychol., 19, 2, 294 (2005)
[6] Depaoli, S.; Clifton, J. P., A Bayesian approach to multilevel structural equation modeling with continuous and dichotomous outcomes, Struct. Equ. Model., 22, 3, 327-351 (2015)
[7] Duan, N.; Manning, W. G.; Morris, C. N.; Newhouse, J. P., A comparison of alternative models for the demand for medical care, J. Bus. Econom. Statist., 1, 2, 115-126 (1983)
[8] Duckworth, A. L.; Seligman, M. E., Self-discipline outdoes IQ in predicting academic performance of adolescents, Psychol. Sci., 16, 12, 939-944 (2005)
[9] Fan, J.; Li, R., Variable selection via nonconcave penalized likelihood and its oracle properties, J. Amer. Statist. Assoc., 96, 456, 1348-1360 (2001) · Zbl 1073.62547
[10] Feng, X.; Lu, B.; Song, X.; Ma, S., Financial literacy and household finances: A Bayesian two-part latent variable modeling approach, J. Empir. Financ., 51, 119-137 (2019)
[11] Feng, X.-N.; Wu, H.-T.; Song, X.-Y., Bayesian adaptive lasso for ordinal regression with latent variables, Sociol. Methods Res., 46, 4, 926-953 (2017)
[12] Gelman, A.; Roberts, G. O.; Gilks, W. R., Efficient Metropolis jumping rules, Bayesian Anal., 5, 599-608, 42 (1996)
[13] Gelman, A., Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper), Bayesian Anal., 1, 3, 515-534 (2006) · Zbl 1331.62139
[14] Green, F.; Ashton, D.; Felstead, A., Estimating the determinants of supply of computing, problem-solving, communication, social, and teamworking skills, Oxf. Econ. Pap., 53, 3, 406-433 (2001)
[15] Heckman, J. J., The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models, Ann. Econ. Soc. Meas., 5, 4, 475-492 (1976)
[16] Heckman, J. J.; Stixrud, J.; Urzua, S., The effects of cognitive and noncognitive abilities on labor market outcomes and social behavior, J. Lab. Econ., 24, 3, 411-482 (2006)
[17] Ishwaran, H.; Rao, J. S., Spike and slab variable selection: frequentist and Bayesian strategies, Ann. Statist., 33, 2, 730-773 (2005) · Zbl 1068.62079
[18] Jencks, C., Who Gets ahead? The Determinants of Economic Success in America (1979), Basic Books: Basic Books New York, NY
[19] Johnson, V. E., On Bayesian analysis of multirater ordinal data: An application to automated essay grading, J. Amer. Statist. Assoc., 91, 433, 42-51 (1996) · Zbl 0925.62104
[20] Lee, S.-Y., Structural Equation Modeling: A bayesian Approach, Vol. 711 (2007), John Wiley & Sons: John Wiley & Sons Hoboken, NJ · Zbl 1154.62022
[21] Li, F.; Zhang, N. R., Bayesian Variable selection in structured high-dimensional covariate spaces with applications in genomics, J. Amer. Statist. Assoc., 105, 491, 1202-1214 (2010) · Zbl 1390.62027
[22] Miller, A., Subset Selection in Regression (2002), CRC Press: CRC Press Boca Raton, FL · Zbl 1051.62060
[23] Moustaki, I., A latent trait and a latent class model for mixed observed variables, Br. J. Math. Stat. Psychol., 49, 2, 313-334 (1996) · Zbl 0904.62137
[24] Muthén, B., A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators, Psychometrika, 49, 1, 115-132 (1984)
[25] Nyhus, E. K.; Pons, E., The effects of personality on earnings, J. Econ. Psychol., 26, 3, 363-384 (2005)
[26] Olsen, M. K.; Schafer, J. L., A two-part random-effects model for semicontinuous longitudinal data, J. Amer. Statist. Assoc., 96, 454, 730-745 (2001) · Zbl 1017.62064
[27] Park, T.; Casella, G., The bayesian lasso, J. Amer. Statist. Assoc., 103, 482, 681-686 (2008) · Zbl 1330.62292
[28] Ročková, V., Bayesian Estimation of sparse signals with a continuous spike-and-slab prior, Ann. Statist., 46, 1, 401-437 (2018) · Zbl 1395.62230
[29] Ročková, V.; George, E. I., The spike-and-slab lasso, J. Amer. Statist. Assoc., 113, 521, 431-444 (2018) · Zbl 1398.62186
[30] Schneider, S.; Stone, A. A., Distinguishing between frequency and intensity of health-related symptoms from diary assessments, J. Psychosom. Res., 77, 3, 205-212 (2014)
[31] Shu, X.; Bian, Y., Market transition and gender gap in earnings in urban China, Soc. Forces, 81, 4, 1107-1145 (2003)
[32] Sicular, T.; Ximing, Y.; Gustafsson, B.; Shi, L., The urban-rural income gap and inequality in China, Rev. Income Wealth, 53, 1, 93-126 (2007)
[33] Skrondal, A.; Rabe-Hesketh, S., Generalized Latent Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models (2004), CRC Press: CRC Press Boca Raton, FL · Zbl 1097.62001
[34] Song, X.-Y.; Lee, S.-Y., Basic and Advanced Bayesian Structural Equation Modeling: With Applications in the Medical and Behavioral Sciences (2012), John Wiley & Sons: John Wiley & Sons Hoboken, NJ · Zbl 1282.62056
[35] Song, X.; Lu, Z.; Cai, J.; Ip, E. H., A Bayesian modeling approach for generalized semiparametric structural equation models, Psychometrika, 78, 4, 624-647 (2013) · Zbl 1288.62188
[36] Tachibanaki, T., Education, occupation and earnings: A recursive approach for France, Eur. Econ. Rev., 13, 1, 103-127 (1980)
[37] Tibshirani, R., Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B Stat. Methodol., 58, 1, 267-288 (1996) · Zbl 0850.62538
[38] Wang, X.; Wu, H.; Feng, X.; Song, X., Bayesian Two-level model for repeated partially ordered responses: Application to adolescent smoking behavior analysis, Sociol. Methods Res. (2019), 004912411982614
[39] Welsh, A. H.; Zhou, X.-H., Estimating the retransformed mean in a heteroscedastic two-part model, J. Statist. Plann. Inference, 136, 3, 860-881 (2006) · Zbl 1079.62039
[40] de Wolff, P.; Van Slijpe, A., The relation between income, intelligence, education and social background, Eur. Econ. Rev., 4, 3, 235-264 (1973)
[41] Xia, Y.-M.; Lu, B.; Tang, N.-S., Inference on two-part latent variable analysis model with multivariate longitudinal data, Struct. Equ. Model., 26, 5, 685-709 (2019)
[42] Xie, F.; Xu, Y.; Priebe, C. E.; Cape, J., Bayesian Estimation of sparse spiked covariance matrices in high dimensions (2018), arXiv preprint arXiv:1808.07433
[43] Yuan, K.-H.; Wu, R.; Bentler, P. M., Ridge structural equation modelling with correlation matrices for ordinal and continuous data, Br. J. Math. Stat. Psychol., 64, 1, 107-133 (2011) · Zbl 1218.62136
[44] Zhang, M.; Strawderman, R. L.; Cowen, M. E.; Wells, M. T., Bayesian inference for a two-part hierarchical model: An application to profiling providers in managed health care, J. Amer. Statist. Assoc., 101, 475, 934-945 (2006) · Zbl 1120.62308
[45] Zhang, C.-H., Nearly unbiased variable selection under minimax concave penalty, Ann. Statist., 38, 2, 894-942 (2010) · Zbl 1183.62120
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.