Statistics and Its Interface
Volume 8 (2015)
Detecting bacterial genomes in a metagenomic sample using NGS reads
Pages: 477 – 494
We use a nucleotide flipping technique on whole genome next generation sequencing (NGS) data to test for the presence of various bacterial strains in a single metagenomic sample. Our technique is novel in that we induce artificial point mutations at the nucleotide level to define a test statistic for each genome on a given reference list. After finding a suitable nucleotide flipping rate, we use a variant of the Westfall-Young procedure to correct for multiple comparisons. When we align reads to reference genomes we permit fractional reads i.e., we weight the contribution of each read by one over the number of genomes to which it aligns. In a large scale simulation we characterize our method’s performance on “clean” data with respect to accuracy, genome lengths and genome abundances. Then, we apply our technique to real data from the Human Microbiome Project (HMP). We compare our results based on adjusted $p$-values with the HMP findings based on abundance, as assessed by coverage. The results from the two methods have substantial overlap; discrepancies can be explained by the inherent variability of the respective processing pipelines and data.
metagenomics, next-generation sequencing, human microbiome project, multiple comparisons, nucleotide flipping, artificial point mutations
2010 Mathematics Subject Classification
Primary 62G10, 62P10. Secondary 62-07.
Published 19 October 2015