Statistics and Its Interface

Volume 5 (2012)

Number 1

Spectral library searching for peptide identification in proteomics

Pages: 39 – 46

DOI: https://dx.doi.org/10.4310/SII.2012.v5.n1.a4

Author

Henry Lam (Department of Chemical and Biomolecular Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong)

Abstract

Spectral library searching is an emerging approach in peptide identification from tandem mass (MS/MS) spectra, a critical step in proteomic data analysis. Tandem mass spectrometry is the process by which peptides are fragmented by high energy in a mass spectrometer. The tandem mass spectra thus collected record the mass-to-charge ratios and abundance of the resulting fragments, and can be used to deduce the peptide sequence. Conceptually, spectral library searching is based on the premise that the fragmentation pattern of a peptide can be viewed as a reproducible fingerprint of that peptide, such that unknown spectra acquired under the same conditions can be identified by spectral matching. In practice, a spectral library is first meticulously compiled from a large collection of previously observed and identified MS/MS spectra, usually obtained from real proteomics experiments of complex mixtures. Then, a query spectrum is identified by spectral matching using recently-developed spectral search engines. A key component of this method is a similarity scoring function that numerically defines the similarity between two spectra. In addition to the similarity score, various methods exist to evaluate the statistical significance of the match, and hence the identification accuracy. This review aims to introduce statisticians, especially those unfamiliar with proteomics data analysis to this rapidly evolving field, and to provide a high-level description of the underlying algorithms and the outstanding challenges.

Keywords

spectral libraries, spectral searching, proteomics, mass spectrometry

Published 17 February 2012