Statistics and Its Interface

Volume 8 (2015)

Number 2

Special Issue on Modern Bayesian Statistics (Part II)

Guest Editor: Ming-Hui Chen (University of Connecticut)

A Bayes testing approach to metagenomic profiling in bacteria

Pages: 173 – 185



Bertrand Clarke (Department of Statistics, University of Nebraska, Lincoln, Neb., U.S.A.)

Camilo Valdes (Center for Computational Sciences, University of Miami, Florida, U.S.A.)

Adrian Dobra (Department of Statistics, University of Washington, Seattle, Wash., U.S.A.)

Jennifer Clarke (Department of Food Science and Technology, University of Nebraska, Lincoln, Neb., U.S.A.)


Using next generation sequencing (NGS) data, we use a multinomial with a Dirichlet prior to detect the presence of bacteria in a metagenomic sample via marginal Bayes testing for each bacterial strain. The NGS reads per strain are counted fractionally with each read contributing an equal amount to each strain it might represent. The threshold for detection is strain-dependent and we apply a correction for the dependence amongst the (NGS) reads by finding the knee in a curve representing a tradeoff between detecting too many strains and not enough strains. As a check, we evaluate the joint posterior probabilities for the presence of two strains of bacteria and find relatively little dependence. We apply our techniques to two data sets and compare our results with the results found by the Human Microbiome Project. We conclude with a discussion of the issues surrounding multiple corrections in a Bayes context.


metagenomics, Bayes testing, bacteria, dependence

2010 Mathematics Subject Classification

Primary 62F15, 62P10. Secondary 62-07, 62F03.

Published 6 March 2015