Statistics and Its Interface

Volume 5 (2012)

Number 1

A review of statistical methods for protein identification using tandem mass spectrometry

Pages: 3 – 20



William Noble (Department of Genome Sciences, University of Washington, Seattle, Wash., U.S.A.)

Oliver Serang (Department of Genome Sciences, University of Washington, Seattle, Wash., U.S.A.)


Tandem mass spectrometry has emerged as a powerful tool for the characterization of complex protein samples, an increasingly important problem in biology. The effort to efficiently and accurately perform inference on data from tandem mass spectrometry experiments has resulted in several statistical methods. We use a common framework to describe the predominant methods and discuss them in detail. These methods are classified using the following categories: set cover methods, iterative methods, and Bayesian methods. For each method, we analyze and evaluate the outcome and methodology of published comparisons to other methods; we use this comparison to comment on the qualities and weaknesses, as well as the overall utility, of all methods. We discuss the similarities between these methods and suggest directions for the field that would help unify these similar assumptions in a more rigorous manner and help enable efficient and reliable protein inference.


mass spectrometry, proteomics, Bayesian methods

2010 Mathematics Subject Classification


Full Text (PDF format)