Statistics and Its Interface

Volume 2 (2009)

Number 3

Alignment of protein mass spectrometry data by integrated Markov chain shifting method

Pages: 329 – 340



Yang Feng (Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey, U.S.A.)

Weiping Ma (Department of Mathematics, Fudan University, Shanghai, China)

Zhanfeng Wang (Department of Statistics and Finance, University of Science and Technology of China, Hefei, Anhui, China)

Yaning Yang (Department of Statistics and Finance, University of Science and Technology of China, Hefei, Anhui, China)

Zhiliang Ying (Department of Statistics, Columbia University, New York, N.Y., U.S.A.)


Mass spectrometers such as SELDI-TOF (surface enhanced laser desorption/ionization time-of-flight) and MALDI-TOF (matrix assisted laser desorption and ionization time-of-flight) measure the relative abundance of different protein ions or protein fragments (peptides) indexed by the mass-to-charge ratio (m/z). A special characteristic of the MS spectra is its variabilities in both m/z values and intensity magnitudes. We propose modelling the logintensities by a semiparametric model and the m/z by the integrated Markov chain shifting (IMS) model, for which the second-order differences of the random effects are assumed to follow a second-order Markov chain. Alignment of spectra is done through averaging over the random shifts conditional on the observed intensity information. The unknown parameters are estimated by an iterative nonparametric maximum profile likelihood method and a Gaussian kernel approximation. The bandwidths in kernel approximation are taken to be 0.04%–0.08% of the m/z values. Simulation results show that the proposed approach can achieve satisfactory alignment by reducing the intensity variations of the misalignment spectra by a factor of around 75%. Most alignment algorithms align spectra by clustering neighboring peaks and do not incorporate peak height information. Our semiparametric random shifting method builds a model taking into consideration of both the random shift effects of neighboring m/z values and similarity of the intensity magnitudes of common peaks within the ranges of about 50% of the intensity values.


MS spectra, semiparametric model, Markov chain, integrated Markov chain shifting, profile likelihood

2010 Mathematics Subject Classification

Primary 62P10. Secondary 62Gxx, 92F05.

Published 1 January 2009