Statistics and Its Interface

Volume 9 (2016)

Number 2

Bayesian model assessments in evaluating mixtures of longitudinal trajectories and their associations with cross-sectional health outcomes

Pages: 183 – 201



Bei Jiang (Department of Biostatistics, Columbia University, New York, N.Y., U.S.A.; and Division of Biostatistics, Department of Child and Adolescent Psychiatry, New York University, New York, N.Y., U.S.A.)

Michael R. Elliott (Department of Biostatistics, Survey Methodology Program, Institute for Social Research, University of Michigan, Ann Arbor, Mich., U.S.A.)

Mary D. Sammel (Center for Clinical Epidemiology and Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Penn., U.S.A.)

Naisyin Wang (Department of Statistics, University of Michigan, Ann Arbor, Mich., U.S.A.)


In joint-modeling analyses that simultaneously consider a set of longitudinal predictors and a primary outcome, the two most frequently used response versus longitudinaltrajectory models utilize latent class (LC) and multiple shared random effects (MSRE) predictors. In practice, it is common to use one model assessment criterion to justify the use of the model. How different criteria perform under the joint longitudinal predictor-scalar outcome model is less understood. In this paper, we evaluate six Bayesian model assessment criteria: Akaike information criterion (AIC) (Akaike, 1973), Bayesian information criterion (BIC) (Schwartz, 1978), integrated classification likelihood criterion (ICL) (Biernacki et al., 1998), the deviance information criterion (DIC) (Spiegelhalter et al., 2002), the logarithm of the pseudomarginal likelihood (LPML) (Geisser and Eddy, 1979) and the widely applicable information criterion (WAIC) (Watanabe, 2010). When needed, the criteria are modified, following the Bayesian principle, to accommodate the joint modeling framework that analyzes longitudinal predictors and binary health outcome data. We report our evaluation based on empirical numerical studies, exploring the relationships and similarities among these criteria. We focus on two evaluation aspects: goodness-of-fit adjusted for the complexity of the models, mostly reflected by the numbers of latent features/classes in the longitudinal trajectories that are part of the hierarchical structure in the joint models, and prediction evaluation based on both training and test samples as well as their contrasts. Our results indicate that all six criteria suffer from difficulty in separating deeply overlapping latent features, with AIC, BIC, ICL and WAIC outperforming others in terms of correctly identifying the number of latent classes. With respect to prediction, DIC,WAIC and LPML tend to choose the models with too many latent classes, leading to better predictive performance on independent validation samples than the models chosen by other criteria do. An interesting result concerning the wrong model choice will be reported. Finally, we use the results from the simulation study to identify the suitable candidate models to link the useful features in the follicle stimulating hormone trajectories to predict risk of severe hot flash in the Penn Ovarian Aging Study.


Bayesian model assessment, joint models, latent class, shared random effect, WAIC, ICL, DIC, LPML, AIC, BIC, out-of-sample validation

Published 4 November 2015