RationaleRegulatory regions can be discovered by analyzing methylation patterns. One of the popular methods is to segment methylomes into distinct categories such as regions with low methylation or high methylation. The most popular method for categorical methylome segmentation is based on hidden Markov models (HMM) (an example is here). Another segmentation strategy is to use variants of change-point analysis, where change points in a signal across the genome is extracted and the genome is segmented to regions between two change points These methods are typically used in CNV (copy-number variation) detection but have applications in methylation context as well.
In the context of methylation, we first extract segments between change points of the signal (methylation profiles) then cluster those segments into groups based on the mean methylation values of the segments. In this case, a segment is a region of the genome where CpGs have similar methylation values. We cluster those segments into segment groups using mixture modeling. The segment groups correspond to highly methylated regions or lowly methylated regions. Since this is an unsupervised method, we can find more patterns than just highly or lowly methylated regions. The users have to further overlap the segments with other marks and annotation such as promoters, H3K27ac or other histone modifications to associate segment groups with functional annotation.
Use case and example
Below, we are using methylation data from Roadmap Epigenomics H1 cell line. The code demonstrates how to use the function and export the results to a BED file. Similarly, we can apply this function on differential methylation profiles, where values range between -100 and 100. Do help(methSeg) to see the help page for the function.