five

CREATION OF DIFFERENTIALLY METHYLATED BACKGROUND GENE SET FOR GO ENRICHMENT

收藏
Figshare2025-05-20 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/CREATION_OF_DIFFERENTIALLY_METHYLATED_BACKGROUND_GENE_SET_FOR_GO_ENRICHMENT/29108138/1
下载链接
链接失效反馈
官方服务:
资源简介:
<br>SUMMARYThis script prepares a background gene set for GO enrichment analysis based on differentially methylated CpGs in <i>Nasonia vitripennis</i> from an ageing (diapause) methylation dataset. It filters CpGs, maps them to genomic features, and annotates them with gene identifiers (gene_id) using exon/intron reference data. GO annotations are then assigned to generate a background for downstream enrichment testing.<br><br>KEY STEPS1. Load Erin's methylation data (erindata_v2.txt)2. Filter for CpGs in the ageing-related DML list (timepoint DMLs)3. Extract chromosome and position from CpG row identifiers4. Join CpG positions with exon/intron gene annotations (Nvit genome)5. Retain only those CpGs overlapping gene features6. Assign GO terms to mapped gene_ids using Nasonia GO file7. Output the resulting gene-GO annotation file as a background set<br>INPUT FILES<br>- erindata_v2.txt: Erin's filtered methylation matrix- un.dmls.timepoint.txt: Differentially methylated CpG list (timepoint only)- GCF_009193385.2_Nvit_psr_1.1_genomic_numbered_exons.txt: Nasonia exon/intron annotation- Nasonia_PSR1.1_Gene_GO.txt: GO annotations for Nasonia genes<br>OUTPUT FILES<br>- diff_methylatedcpgs_with_LOC.csv: CpGs mapped to annotated gene features- diff_backgroundGOannotations.csv: Final gene-GO background used for enrichment<br>SOFTWARE REQUIREMENTS<br>- R packages: data.table, dplyr, tidyr, readr, sqldf, GOstats, GSEABase, treemap- Input genome annotation must contain numbered exon/intron coordinates and gene_id mappings<br>NOTES- GO evidence code is set to "IEA" for all terms- CpGs not overlapping gene features are excluded from the background- This background set is later used in GO enrichment scripts (see Table S4 analysis)<br><br><br>CONTACT<br>Eamonn Mallon ebm3@le.ac.uk<br>

摘要 本脚本基于衰老(滞育)甲基化数据集中的丽蝇蛹集金小蜂(Nasonia vitripennis)差异甲基化CpG位点(differentially methylated CpGs),为GO富集分析(GO enrichment analysis)构建背景基因集。流程包括过滤CpG位点、将其映射至基因组特征,并利用外显子/内含子参考数据为其注释基因标识符(gene_id)。随后分配GO注释,以生成用于下游富集测试的背景集。 关键步骤 1. 加载Erin的甲基化矩阵数据(erindata_v2.txt) 2. 从衰老相关差异甲基化位点(differentially methylated loci, DML)列表中筛选CpG位点(仅时间点相关DML) 3. 从CpG行标识符中提取染色体编号与位点位置 4. 将CpG位点位置与丽蝇蛹集金小蜂(Nvit)基因组的外显子/内含子基因注释进行关联 5. 仅保留与基因特征存在重叠的CpG位点 6. 利用丽蝇蛹集金小蜂GO注释文件,为已映射的gene_id分配GO术语 7. 输出最终的基因-GO注释文件,作为背景集使用 输入文件 - erindata_v2.txt:Erin整理的甲基化矩阵文件 - un.dmls.timepoint.txt:仅包含时间点相关的差异甲基化CpG位点列表 - GCF_009193385.2_Nvit_psr_1.1_genomic_numbered_exons.txt:丽蝇蛹集金小蜂外显子/内含子注释文件 - Nasonia_PSR1.1_Gene_GO.txt:丽蝇蛹集金小蜂基因的GO注释文件 输出文件 - diff_methylatedcpgs_with_LOC.csv:已映射至注释基因特征的CpG位点文件 - diff_backgroundGOannotations.csv:用于富集分析的最终基因-GO背景集文件 软件需求 - 所需R包:data.table、dplyr、tidyr、readr、sqldf、GOstats、GSEABase、treemap - 输入的基因组注释文件需包含带编号的外显子/内含子坐标及gene_id映射关系 备注 - 所有GO术语的证据代码均设置为"IEA" - 未与基因特征重叠的CpG位点将被排除在背景集之外 - 该背景集后续将用于GO富集分析脚本(详见补充表S4分析) 联系方式 Eamonn Mallon,邮箱:ebm3@le.ac.uk
提供机构:
Mallon, Eamonn
创建时间:
2025-05-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作