Statistics and Its Interface

Volume 1 (2008)

Number 1

Semiparametric latent covariate mixed-effects models with application to a colon carcinogenesis study

Pages: 75 – 86



Zonghui Hu (NIAID, National Institutes of Health, Bethesda, Md., U.S.A.)

Naisyin Wang (Department of Statistics, Texas A&M University, College Station, Tx., U.S.A.)


We study a mixed-effects model in which the response and the main covariate are linked by position. While the covariate corresponding to the observed response is not directly observable, there exists a latent covariate process that represents the underlying positional features of the covariate. When the positional features and the underlying distributions are parametric, the expectation-maximization (EM) is the most commonly used procedure. Though without the parametric assumptions, the practical feasibility of a semiparametric EM algorithm and the corresponding inference procedures remain to be investigated. In this paper, we propose a semiparametric approach, and identify the conditions under which the semiparametric estimators share the same asymptotic properties as the unachievable estimators using the true values of the latent covariate; that is, the oracle property is achieved. We propose a Monte Carlo graphical evaluation tool to assess the adequacy of the sample size for achieving the oracle property. The semiparametric approach is later applied to data from a colon carcinogenesis study on the effects of cell DNA damage on the expression level of oncogene $bcl-2$. The graphical evaluation shows that, with moderate size of subunits, the numerical performance of the semiparametric estimator is very close to the asymptotic limit. It indicates that a complex EM-based implementation may at most achieve minimal improvement and is thus unnecessary.


carcinogenesis, consistency, generalized estimating equation, local linear smoothing, mixed-effects model

Full Text (PDF format)