METHYLATION SITE ANALYSIS USING methylKit
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/METHYLATION_SITE_ANALYSIS_USING_methylKit/29120102
下载链接
链接失效反馈官方服务:
资源简介:
SUMMARY
This script processes bisulfite sequencing data to identify significantly methylated CpG sites
across multiple developmental timepoints and conditions in Nasonia vitripennis. It performs:
- Read filtering
- Merging methylation data
- Binomial testing per sample
- FDR correction
- Site selection
- Methylation profile visualization
ORIGIN
Script developed using `methylKit` for differential methylation analysis on whole-genome bisulfite
data from the Erin Diapause project.
KEY STEPS
1. Read methylation data (`*.cov` files) using `methRead()`
2. Filter CpGs by minimum and maximum coverage thresholds
3. Merge all samples into a common methylation object using `unite()`
4. Conduct one-tailed binomial tests (p > background) per CpG site for each sample
5. Apply Benjamini-Hochberg FDR correction per sample
6. Filter for significant sites (FDR < 0.05)
7. Collate significant sites across all samples
8. Extract a methylBase object from the merged sites
9. Export CpG-level data to `Total_Methylated_Bases.txt`
10. Generate correlation, clustering, and PCA plots across all retained samples
INPUT FILES
- `*.CpG_report.merged_CpG_evidence.cov`: Methylation files from Bismark (one per sample)
SAMPLES
- 40 samples representing combinations of treatment (Control/Diapause) and timepoint (6d to 30d), with replicates
- Sample names: D6C1, D6D1, D12C1, ..., D6C4
- Treatment vector: `treat_conditions` defines relative order
OUTPUT FILES
- `step1meth.RData`, `step2meth.RData`, `step3meth.RData`, `finalmeth.RData`: R environments at major steps
- `Total_Methylated_Bases.txt`: Final table of significant methylated sites across all samples
- `CpG_Correlation.pdf`: Pairwise methylation correlation matrix
- `CpG_Cluster.pdf`: Hierarchical clustering dendrogram
- `CpG_PCA.pdf`: PCA plot of CpG methylation patterns
SOFTWARE REQUIREMENTS
- R package: `methylKit` (v1.22.1+ recommended)
- Additional packages: `stats`, `utils`, `grDevices` (base R)
NOTES
- Filtering is performed with `mincov=10` and high-coverage threshold at 99.9th percentile
- Binomial tests are one-tailed (test for greater-than-background methylation)
- Each sample uses a slightly adjusted p-value threshold (e.g., 0.004–0.005)
- Data are not destranded or normalized at this stage
LIMITATIONS
- Statistical tests are applied per sample; no groupwise differential analysis is performed here
- Only CpGs passing FDR < 0.05 are retained
- Manual sample-by-sample processing of 40 timepoints; automation is possible for scalability
CONTACT
Eamonn Mallon
ebm3@le.ac.uk
创建时间:
2025-05-21



