Statistics and Its Interface
Volume 5 (2012)
Spectral library searching for peptide identification in proteomics
Pages: 39 – 46
Spectral library searching is an emerging approach in peptide identification from tandem mass (MS/MS) spectra, a critical step in proteomic data analysis. Tandem mass spectrometry is the process by which peptides are fragmented by high energy in a mass spectrometer. The tandem mass spectra thus collected record the mass-to-charge ratios and abundance of the resulting fragments, and can be used to deduce the peptide sequence. Conceptually, spectral library searching is based on the premise that the fragmentation pattern of a peptide can be viewed as a reproducible fingerprint of that peptide, such that unknown spectra acquired under the same conditions can be identified by spectral matching. In practice, a spectral library is first meticulously compiled from a large collection of previously observed and identified MS/MS spectra, usually obtained from real proteomics experiments of complex mixtures. Then, a query spectrum is identified by spectral matching using recently-developed spectral search engines. A key component of this method is a similarity scoring function that numerically defines the similarity between two spectra. In addition to the similarity score, various methods exist to evaluate the statistical significance of the match, and hence the identification accuracy. This review aims to introduce statisticians, especially those unfamiliar with proteomics data analysis to this rapidly evolving field, and to provide a high-level description of the underlying algorithms and the outstanding challenges.
spectral libraries, spectral searching, proteomics, mass spectrometry