Great ape segregating sites
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://figshare.com/articles/dataset/Great_ape_segregating_sites/25513405
下载链接
链接失效反馈官方服务:
资源简介:
Segregating sites in the three great ape clades: Pan, Gorilla, Pongo.
Pan: bonobos (Pan paniscus) and chimpanzees (Pan troglodytes, 4 subspecies)
Gorilla: Eastern gorillas (Gorilla beringei, 2 subspecies) and Western gorillas (Gorilla gorilla, 2 subspecies)
Pongo: Sumatran orangutan (Pongo abelii), Bornean orangutan (Pongo pygmaeus), Tapanuli orangutan (Pongo tapanuliensis).
In total 141 individuals. Segregating sites within each clade for wild-born individuals with more than 12-fold genome-wide average coverage in VCF format. Raw data retrieved from European Nucleotide Archive (ENA), with identifiers from the original studies. Adapters removed with Trimmomatic 0.39, mapping with bwa mem 0.7.16a to the human reference genome version hg38 (GCA_000001405.15_GRCh38_no_alt_analysis_set.fna), duplicates removed with gatk 4.1.4.0 MarkDuplicatesSpark, merging with samtools 1.14, genotype calling with gatk 4.1.4.0 HaplotypeCaller, merging with gatk 4.1.4.0 GenomicsDBImport and GenotypeGVCFs, concatenation with bcftools 1.19 concat.
三大类人猿演化支的分离位点:黑猩猩属(Pan)、大猩猩属(Gorilla)与猩猩属(Pongo)。
黑猩猩属:包含倭黑猩猩(Pan paniscus)以及黑猩猩(Pan troglodytes,共4个亚种)
大猩猩属:包含东部大猩猩(Gorilla beringei,2个亚种)与西部大猩猩(Gorilla gorilla,2个亚种)
猩猩属:包含苏门答腊猩猩(Pongo abelii)、婆罗洲猩猩(Pongo pygmaeus)以及塔巴努里猩猩(Pongo tapanuliensis)。
本数据集共涵盖141个个体,包含各演化支内野生个体的全基因组分离位点,所有个体的全基因组平均覆盖度均超过12倍,数据格式为VCF格式。
原始数据下载自欧洲核苷酸档案库(European Nucleotide Archive, ENA),数据标识符来自原始研究。后续数据分析流程如下:使用Trimmomatic 0.39去除测序接头序列,通过bwa mem 0.7.16a将测序reads比对至人类参考基因组版本hg38(GCA_000001405.15_GRCh38_no_alt_analysis_set.fna),利用GATK 4.1.4.0的MarkDuplicatesSpark移除PCR重复序列,使用samtools 1.14进行文件合并,通过GATK 4.1.4.0的HaplotypeCaller完成基因型调用,再借助GATK 4.1.4.0的GenomicsDBImport与GenotypeGVCFs完成基因型数据合并,最后使用bcftools 1.19 concat完成序列拼接。
创建时间:
2024-04-08



