Statistics and Its Interface

Volume 11 (2018)

Number 4

Addressing varying non-ignorable missing data mechanisms using a penalized EM algorithm: application to quantitative proteomics data

Pages: 581 – 586



So Young Ryu (School of Community Health Sciences, University of Nevada, Reno, Nv., U.S.A.)


In multi-laboratory collaborative or large-scale proteomic studies, it is challenging to analyze data properly due to varying non-ignorable missing data mechanisms across experiments. PEMM (Penalized EM algorithm incorporating missing data mechanism) proposed by Chen, Prentice and Wang [1] estimates both the mean and the covariance of protein abundances in the presence of non-ignorable missing data; however, PEMM assumes a common missing mechanism for all experiments. This approach may be adequate when experiments are performed under similar conditions, but it may not work optimally when experiments are conducted in different laboratories or over a long period of time. In this paper, we extend PEMM to appropriately handle varying missing data mechanisms for datasets generated at multiple laboratories. Recognizing that jointly estimating missing mechanisms and parameters of interest is a challenging task, we assume that missing data mechanisms are known, and demonstrate benefits of incorporating multiple missing mechanisms for datasets generated at different laboratories. We call our algorithm PEMvM (Penalized EM algorithm for varying non-ignorable missing mechanisms). Our extension is simple and enjoys all the properties that PEMM offers. When missing data mechanisms differ across experiments, PEMvM performs better than PEMM in terms of accurate mean estimation and data imputation. In this paper, we demonstrate the performance of PEMvM using both simulated and real proteomic data.


mass spectrometry, proteomics, protein relative quantitation, non-ignorable missing data, varying missing data mechanisms

Full Text (PDF format)

This publication was made possibly by a grant from the National Institute of General Medical Sciences (GM103440) of the National Institutes of Health.

Received 28 December 2017

Published 19 September 2018