Communications in Information and Systems

Volume 10 (2010)

Number 2

Methods for Allocating Ambiguous Short-reads

Pages: 69 – 82

DOI: https://dx.doi.org/10.4310/CIS.2010.v10.n2.a1

Authors

Doron Lipson

Terence P. Speed

Margaret Taub

Abstract

With the rise in prominence of biological research using new short-read DNA sequencing technologies comes the need for new techniques for aligning and assigning these reads to their genomic location of origin. Until now, methods for allocating reads which align with equal or similar fidelity to multiple genomic locations have not been model-based, and have tended to ignore potentially informative data. Here, we demonstrate that existing methods for assigning ambiguous reads can produce biased results. We then present new methods for allocating ambiguous reads to the genome, developed within a framework of statistical modeling, which show promise in alleviating these biases, both in simulated and real data.

Published 1 January 2010