Genome at Juncture of Early Human Migration: A Systematic Analysis of Two Whole Genomes and Thirteen Exomes from Kuwaiti Population Subgroup of Inferred Saudi Arabian Tribe Ancestry
收藏NIAID Data Ecosystem2026-03-08 收录
下载链接:
https://figshare.com/articles/dataset/_Genome_at_Juncture_of_Early_Human_Migration_A_Systematic_Analysis_of_Two_Whole_Genomes_and_Thirteen_Exomes_from_Kuwaiti_Population_Subgroup_of_Inferred_Saudi_Arabian_Tribe_Ancestry_/1093692
下载链接
链接失效反馈官方服务:
资源简介:
Population of the State of Kuwait is composed of three genetic subgroups of inferred Persian, Saudi Arabian tribe and Bedouin ancestry. The Saudi Arabian tribe subgroup traces its origin to the Najd region of Saudi Arabia. By sequencing two whole genomes and thirteen exomes from this subgroup at high coverage (>40X), we identify 4,950,724 Single Nucleotide Polymorphisms (SNPs), 515,802 indels and 39,762 structural variations. Of the identified variants, 10,098 (8.3%) exomic SNPs, 139,923 (2.9%) non-exomic SNPs, 5,256 (54.3%) exomic indels, and 374,959 (74.08%) non-exomic indels are ‘novel’. Up to 8,070 (79.9%) of the reported novel biallelic exomic SNPs are seen in low frequency (minor allele frequency <5%). We observe 5,462 known and 1,004 novel potentially deleterious nonsynonymous SNPs. Allele frequencies of common SNPs from the 15 exomes is significantly correlated with those from genotype data of a larger cohort of 48 individuals (Pearson correlation coefficient, 0.91; p <2.2×10−16). A set of 2,485 SNPs show significantly different allele frequencies when compared to populations from other continents. Two notable variants having risk alleles in high frequencies in this subgroup are: a nonsynonymous deleterious SNP (rs2108622 [19:g.15990431C>T] from CYP4F2 gene [MIM:*604426]) associated with warfarin dosage levels [MIM:#122700] required to elicit normal anticoagulant response; and a 3′ UTR SNP (rs6151429 [22:g.51063477T>C]) from ARSA gene [MIM:*607574]) associated with Metachromatic Leukodystrophy [MIM:#250100]. Hemoglobin Riyadh variant (identified for the first time in a Saudi Arabian woman) is observed in the exome data. The mitochondrial haplogroup profiles of the 15 individuals are consistent with the haplogroup diversity seen in Saudi Arabian natives, who are believed to have received substantial gene flow from Africa and eastern provenance. We present the first genome resource imperative for designing future genetic studies in Saudi Arabian tribe subgroup. The full-length genome sequences and the identified variants are available at ftp://dgr.dasmaninstitute.org and http://dgr.dasmaninstitute.org/DGR/gb.html.
科威特全国人口由三个遗传亚群构成,分别为推断的波斯血统、沙特阿拉伯部落血统与贝都因人血统。沙特阿拉伯部落亚群的起源可追溯至沙特阿拉伯的内季德(Najd)地区。本研究对该亚群的2份全基因组序列与13份外显子组序列开展了高覆盖度(>40X)测序,共鉴定出4,950,724个单核苷酸多态性(Single Nucleotide Polymorphisms, SNPs)、515,802个插入缺失(indels)以及39,762个结构变异(structural variations)。在上述鉴定出的变异中,10,098个(8.3%)外显子组SNPs、139,923个(2.9%)非外显子组SNPs、5,256个(54.3%)外显子组indels以及374,959个(74.08%)非外显子组indels均为新型变异。已报道的新型双等位基因外显子组SNPs中,多达8,070个(79.9%)属于低频变异,即次要等位基因频率(minor allele frequency)<5%。本研究共发现5,462个已知及1,004个新型的潜在有害错义SNPs。15份外显子组样本的常见SNPs等位基因频率,与纳入48名个体的更大队列的基因型数据所得的等位基因频率显著相关(皮尔逊相关系数(Pearson correlation coefficient)为0.91;p <2.2×10⁻¹⁶)。有2,485个SNPs的等位基因频率与其他大陆人群相比存在显著差异。该亚群中存在两个值得关注的高频风险等位基因变异:其一为位于CYP4F2基因[MIM:*604426]的错义有害SNP(rs2108622 [19:g.15990431C>T]),其与引发正常抗凝反应所需的华法林剂量[MIM:#122700]相关;其二为位于ARSA基因[MIM:*607574]的3'非翻译区(3' UTR)SNP(rs6151429 [22:g.51063477T>C]),其与异染性脑白质营养不良[MIM:#250100]相关。本研究在外显子组数据中还检测到了利雅得血红蛋白变异体,该变异体首次在一名沙特阿拉伯女性体内被鉴定。15名个体的线粒体单倍群谱与沙特本土人群的单倍群多样性特征相符,而沙特本土人群被认为接收了来自非洲与东部起源人群的大量基因流(gene flow)。本研究为沙特阿拉伯部落亚群的后续遗传学研究提供了首个不可或缺的基因组资源。全长基因组序列与已鉴定的变异可通过ftp://dgr.dasmaninstitute.org 以及 http://dgr.dasmaninstitute.org/DGR/gb.html 公开获取。
创建时间:
2016-01-15



