five

Genotype data for a set of 163 worldwide populations

收藏
Mendeley Data2020-02-10 更新2026-04-09 收录
下载链接:
https://data.mendeley.com/datasets/ckz9mtgrjj/3
下载链接
链接失效反馈
官方服务:
资源简介:
Here is a combined dataset of genetic data on 2,643 individuals from 163 worldwide human populations. These genotypes were all generated on Illumina chips (550, 610, 660) for multiple different studies. The two main papers that this dataset was compiled for are: Hellenthal, et al 2014 A Genetic Atlas of Human Admixture History, Science; and Busby, et al 2015 The role of recent admixture in forming the contemporary West Eurasian genomic landscape, Current Biology. The data are in PLINK format and the BusbyWorldwidePopulations.csv file outlines where the different datasets come from. Note that because these two datasets were combined together, not all populations are typed on the same set of SNPs. We have included genotype data on 523,443 SNPs, of which 441,038 are genotyped on at least 97.5% of individuals. Therefore, additional QC steps are required to filter this set down to high quality calls, depending on the subset of samples that are required. Complete information about the populations used is available in the various publications that are outlined in the associated paper. Note that these same populations are available elsewhere and this dataset represents that compiled for the above mentioned papers. UPDATE 11/11/2019 Thanks to some heroic work by Kristján Helgi Swerford Moore at DECODE, I have now updated the population and sample information to more accurately and verbosely label the individuals.

本数据集为整合自多项独立研究的人类遗传数据集合,涵盖全球163个人类族群的2643名个体的基因型信息。所有基因型均通过Illumina芯片(型号550、610、660)完成检测,适配多项独立研究。本数据集主要为支撑两篇核心学术论文而编译而来:其一为2014年Hellenthal等人发表于《Science》的《人类混血历史遗传图谱》(*A Genetic Atlas of Human Admixture History*);其二为2015年Busby等人发表于《Current Biology》的《近期混血在塑造当代西欧亚基因组景观中的作用》(*The role of recent admixture in forming the contemporary West Eurasian genomic landscape*)。数据采用PLINK格式存储,配套文件BusbyWorldwidePopulations.csv列明了各子数据集的来源信息。需注意:由于本数据集由两份独立子数据集合并而来,并非所有族群均使用同一套单核苷酸多态性(Single Nucleotide Polymorphism, SNP)位点进行基因分型。本数据集共包含523443个SNP位点的基因型数据,其中441038个位点可在至少97.5%的个体中完成有效分型。因此需根据研究所需的样本子集,额外开展质量控制(Quality Control, QC)步骤,以筛选出高质量的分型结果。关于本数据集所用族群的完整信息,可查阅配套论文中列明的各类相关已发表文献。需说明:上述族群的遗传数据可在其他公开资源中获取,本数据集为上述两篇核心论文专门编译的整合版本。2019年11月11日更新:感谢DECODE公司的Kristján Helgi Swerford Moore付出的卓绝工作,现已更新族群与样本信息,以更准确且详尽的方式对个体进行标注。
创建时间:
2020-02-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作