Nanopore R9.4.1 NA12878 methylation frequencies
收藏Mendeley Data2024-01-31 更新2024-06-29 收录
下载链接:
https://figshare.com/articles/dataset/Nanopore_R9_4_1_NA12878_methylation_frequencies/21543330/1
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains the methylation frequencies for the NA12878 computed using the complete nanopore public dataset available at https://github.com/nanopore-wgs-consortium/NA12878/blob/master/Genome.md. Nanopore raw signals for ~130 Gbp of data (~40x coverage) were downloaded and then converted to BLOW5 format using slow5tools. Then, they were basecalled using buttery-eel under Guppy 6.3.7 high accuracy mode. Reads that passed the qscore filter (>7) were mapped using minimap2 2.17 to hg38noAlt genome. Next, methylation calling was performed using f5c 1.1. Finally, the methylation frequencies output by f5c in tsv format were converted to bigwig format. Commands: basecall #buttery-eel -i na12878_dna_merged.blow5 --guppy_bin /install/ont-guppy-6.3.7/bin/ --config dna_r9.4.1_450bps_hac.cfg -x cuda:all -q 7 -o reads.fastq --port 5555 --use_tcp alignment minimap2 -ax map-ont -t40 --secondary=no /genome/hg38noAlt.idx na12878_dna_merged_pass.fastq > na12878_dna_merged_pass.sam samtools sort -@40 -o na12878_dna_merged_pass.bam na12878_dna_merged_pass.sam samtools index na12878_dna_merged_pass.bam methylation calling f5c index -t20 na12878_dna_merged_pass.fastq --skip-slow5-idx --slow5 na12878_dna_merged.blow5 f5c call-methylation -x hpc-low -t20 -g /genome/hg38noAlt.fa -r na12878_dna_merged_pass.fastq -b na12878_dna_merged_pass.bam --slow5 na12878_dna_merged.blow5 > na12878_dna_merged_pass_f5c_meth.tsv f5c meth-freq -s -i na12878_dna_merged_pass_f5c_meth.tsv -o na12878_dna_merged_pass_f5c_methfreq.tsv convert to bigwig tail -n +2 na12878_dna_merged_pass_f5c_methfreq.tsv | awk '{print $1"\t"$2"\t"$3+1"\t"$7}' | sort -k1,1 -k2,2n > meth_freq.bedgraph bedGraphToBigWig meth_freq.bedgraph /genome/hg38.chrom.sizes na12878_dna_merged_pass_f5c_methfreq.bigwig
本数据集包含针对人类样本NA12878的甲基化频率数据,其计算基于公开的完整纳米孔测序数据集,该数据集可从https://github.com/nanopore-wgs-consortium/NA12878/blob/master/Genome.md获取。研究团队首先下载了约130 Gbp的纳米孔原始测序信号数据(约40倍覆盖度),并使用slow5tools工具(slow5tools)将其转换为BLOW5格式(BLOW5)。随后采用buttery-eel工具(buttery-eel),在Guppy 6.3.7高精度模式下完成碱基识别。筛选掉质量分数(qscore)低于7的测序读段(reads)后,使用minimap2 2.17比对工具(minimap2 2.17)将合格读段比对至hg38noAlt参考基因组。接下来使用f5c 1.1工具(f5c 1.1)开展甲基化位点识别工作。最终将f5c输出的TSV格式(TSV)甲基化频率结果转换为bigwig格式(bigwig)。具体处理命令如下:
1. 碱基识别:
buttery-eel -i na12878_dna_merged.blow5 --guppy_bin /install/ont-guppy-6.3.7/bin/ --config dna_r9.4.1_450bps_hac.cfg -x cuda:all -q 7 -o reads.fastq --port 5555 --use_tcp
2. 序列比对与BAM文件处理:
minimap2 -ax map-ont -t40 --secondary=no /genome/hg38noAlt.idx na12878_dna_merged_pass.fastq > na12878_dna_merged_pass.sam
samtools sort -@40 -o na12878_dna_merged_pass.bam na12878_dna_merged_pass.sam
samtools index na12878_dna_merged_pass.bam
3. 甲基化位点识别与频率计算:
f5c index -t20 na12878_dna_merged_pass.fastq --skip-slow5-idx --slow5 na12878_dna_merged.blow5
f5c call-methylation -x hpc-low -t20 -g /genome/hg38noAlt.fa -r na12878_dna_merged_pass.fastq -b na12878_dna_merged_pass.bam --slow5 na12878_dna_merged.blow5 > na12878_dna_merged_pass_f5c_meth.tsv
f5c meth-freq -s -i na12878_dna_merged_pass_f5c_meth.tsv -o na12878_dna_merged_pass_f5c_methfreq.tsv
4. 格式转换(转换为bigwig格式):
tail -n +2 na12878_dna_merged_pass_f5c_methfreq.tsv | awk '{print $1" "$2" "$3+1" "$7}' | sort -k1,1 -k2,2n > meth_freq.bedgraph
bedGraphToBigWig meth_freq.bedgraph /genome/hg38.chrom.sizes na12878_dna_merged_pass_f5c_methfreq.bigwig
创建时间:
2024-01-31
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集提供了NA12878样本的纳米孔测序甲基化频率数据,包含约40x覆盖度的测序数据,经过碱基识别、比对和甲基化调用等处理步骤,适用于表观遗传学研究。数据集以tsv.gz和bigwig两种格式提供,便于不同分析需求的使用。
以上内容由遇见数据集搜集并总结生成



