Supporting data for "Deep whole-genome sequencing of 90 Han Chinese genomes"
收藏DataCite Commons2025-05-26 更新2025-04-15 收录
下载链接:
http://gigadb.org/dataset/100302
下载链接
链接失效反馈官方服务:
资源简介:
Next generation sequencing provides a high-resolution insight into human genetic information. However, the focus of previous studies has primarily been on low-coverage data, due to the high cost of sequencing. Although the 1000 Genomes Project and the Haplotype Reference Consortium have both provided powerful reference panels for imputation, low frequency and novel variants remain difficult to discover and call with accuracy on the basis of low-coverage data. Deep sequencing provides an optimal solution for the problem of these low frequency and novel variants. Although whole exome sequencing is also a viable choice for exome regions, it cannot account for noncoding regions, sometimes resulting in the absence of important, causal variants. For Han Chinese populations, the majority of variants have been discovered based upon low-coverage data from the 1000 Genomes Project. However, high-coverage, whole genome sequencing data is limited for any population, and a large amount of low-frequency, population-specific variants remains uncharacterized.
We have performed whole genome sequencing at high depth (~80X) of 90 unrelated individuals of Chinese ancestry, collected from the 1000 Genome Project samples, including 45 North Han Chinese and 45 South Han Chinese samples. 83 of these 90 have not been sequenced by the 1000 Genomes Project. We have identified 12,568,804 single nucleotide polymorphisms, 2,074,734 short InDels and 26,142 structural variations from these 90 samples. Compared to the Han Chinese data from the 1000 Genomes Project, we have found 7,007,685 novel variants with low frequency (defined as minor allele frequency < 5%), including 5,816,839 SNPs, 1,172,919 InDels, and 17,927 structural variants.
Using deep sequencing data, we have built a greatly expanded spectrum of genetic variation for the Han Chinese genome. Compared to the 1000 Genomes Project, this Han Chinese deep sequencing data enhances characterization of a large number of low-frequency, novel variants. This will be a valuable resource for promoting Chinese genetics research and medical development. Additionally, it will provide a valuable supplement for the 1000 Genomes Project, as well as for other human genome projects.
提供机构:
GigaScience Database
创建时间:
2017-07-10



