Statistics and Its Interface
Volume 10 (2017)
Correcting length-bias in gene set analysis for DNA methylation data
Pages: 279 – 289
The enrichment analysis of pre-defined gene sets is a widely used tool to extract functional information in association studies. However, traditional methods give biased results on genome-wide DNA methylation data due to the different number of probes in genes. In this article, we present MethylSet, a novel two-step procedure which combines gene based association analysis with logistic regression model for enrichment analysis to correct bias induced by gene size. The adjustment of gene size effect is crucial because irrelevant gene sets may be identified otherwise. Our simulation studies showed that MethylSet has a well-controlled type I error rate and promising statistical power. When applied to a real DNA methylation data set, MethylSet was able to obtain meaningful gene sets associated with the studied disease outcome.
epigenome-wide association study (EWAS), length bias, logistic kernel machine regression, gene set analysis