×

Specifying and implementing nonparametric and semiparametric survival estimators in two-stage (nested) cohort studies with missing case data. (English) Zbl 1119.62365

Summary: Since 1986, we have been studying a cohort of individuals from a region in China with epidemic rates of gastric cardia cancer and have conducted numerous two-stage studies to assess the association of various exposures with this cancer. Two-stage studies are a commonly used statistical design. Stage one involves observing the outcomes and accessible baseline covariate information on all cohort members, and stage two involves using the stage one observations to select a subset of the cohort for measurements of exposures that are difficult to obtain. When the outcomes are censored failure times, such as in our studies, the most common designs used are the case-cohort and nested case-control designs. One limitation of both these designs is that the estimators of the cumulative hazards, and hence survivals and absolute risks, are biased when some cases are missing the stage two measurements. In our experience, such missingness is present in virtually all two-stage studies that (like ours) use biological specimens to obtain exposure measurements. In earlier work we derived and characterized the efficiency of a class of nonparametric and a class of semiparametric cumulative hazard estimators that are unbiased regardless of whether or not all cases are measured. We limit the presentation of the mathematical derivation of these two classes to aspects important to study design and analysis. We analyze data from a two-stage study that we conducted on the association of Helicobacter pylori infection with incident gastric cardia cancers. We discuss the substantive reasons why we deliberately sampled only 25% of the available cancer cases. Through simulations, we demonstrate that substantial variation in precision exists between unbiased estimators within each class, and express the origin of these differences in terms of parameters familiar to investigators. We describe how preexistent knowledge about these parameters can be used to increase estimator precision, and detail specific strategies for constructing such estimators. Computer code in \(R\) that implements these estimators is available from the authors on request.

MSC:

62N02 Estimation in survival analysis and censored data
62G05 Nonparametric estimation
62P10 Applications of statistics to biology and medical sciences; meta analysis

Software:

NestedCohort; R
Full Text: DOI