A Method to Detect Differentially Methylated Loci With Next-Generation Sequencing


Correspondence to: Hongyan Xu, Department of Biostatistics and Epidemiology, Georgia Health Sciences University, 1120 15th Street, Augusta, GA 30912-4900. E-mail: hxu@gru.edu


Epigenetic changes, especially DNA methylation at CpG loci have important implications in cancer and other complex diseases. With the development of next-generation sequencing (NGS), it is feasible to generate data to interrogate the difference in methylation status for genome-wide loci using case-control design. However, a proper and efficient statistical test is lacking. There are several challenges. First, unlike methylation experiments using microarrays, where there is one measure of methylation for one individual at a particular CpG site, here we have the counts of methylation allele and unmethylation allele for each individual. Second, due to the nature of sample preparation, the measured methylation reflects the methylation status of a mixture of cells involved in sample preparation. Therefore, the underlying distribution of the measured methylation level is unknown, and a robust test is more desirable than parametric approach. Third, currently NGS measures methylation at over 2 million CpG sites. Any statistical tests have to be computationally efficient in order to be applied to the NGS data. Taking these challenges into account, we propose a test for differential methylation based on clustered data analysis by modeling the methylation counts. We performed simulations to show that it is robust under several distributions for the measured methylation levels. It has good power and is computationally efficient. Finally, we apply the test to our NGS data on chronic lymphocytic leukemia. The results indicate that it is a promising and practical test.