Statistics and Its Interface

Volume 4 (2011)

Number 3

Practical consideration of genotype imputation: Sample size, window size, reference choice, and untyped rate

Pages: 339 – 351

DOI: http://dx.doi.org/10.4310/SII.2011.v4.n3.a8

Authors

Guimin Gao (Department of Biostatistics, School of Medicine, Virginia Commonwealth University, Richmond, Va., U.S.A.)

Nita A. Limdi (Department of Neurology, University of Alabama at Birmingham, U.S.A.)

Nianjun Liu (Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, U.S.A.)

Boshao Zhang (Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, U.S.A.)

Kui Zhang (Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, U.S.A.)

Degui Zhi (Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, U.S.A.)

Abstract

Imputation offers a promising way to infer the missing and/or untyped genotypes in genetic studies. In practice, however, many factors may affect the quality of imputation. In this study, we evaluated the influence of untyped rate, sizes of the study sample and the reference sample, window size, and reference choice (for admixed population), as the factors affecting the quality of imputation. The results show that in order to obtain good imputation quality, it is necessary to have an untyped rate less than 50%, a reference sample size greater than 50, and a window size of greater than 500 SNPs (roughly 1 MB in base pairs). Compared with the whole-region imputation, piecewise imputation with largeenough window sizes provides improved efficacy. For an admixed study sample, if only an external reference panel is used, it should include samples from the ancestral populations that represent the admixed population under investigation. Internal references are strongly recommended. When internal references are limited, however, augmentation by external references should be used carefully. More specifically, augmentation with samples from the major source populations of the admixture can lower the quality of imputation; augmentation with seemingly genetically unrelated cohorts may improve the quality of imputation.

Keywords

genotype imputation, genetic study, admixed population, untyped rate, window size, reference

Full Text (PDF format)