five

DCEG Imputation Reference Dataset

收藏
NIAID Data Ecosystem2026-05-16 收录
下载链接:
https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000396.v1.p1
下载链接
链接失效反馈
官方服务:
资源简介:
We have built a new resource for imputation of SNPs for existing and future genome-wide association studies (GWAS), known as the Division of Cancer Epidemiology and Genetics (DCEG) Reference Set. The first build of the data set includes 728 cancer-free individuals of European descent from three large prospectively sampled studies, 98 African-American individuals from the Prostate, Lung, Colon, and Ovary Cancer Screening Trial (PLCO), 74 Chinese individuals from a Chinese clinical trial in Shanxi, China (SHNX), and 349 unrelated individuals from the HapMap Project (see Molecular Data Section for details on arrays used). The final harmonized dataset includes 2.8 million autosomal polymorphic SNPs on 1,249 subjects after rigorous quality control metrics were applied.]]> An established quality control (QC) process was applied to samples by study (Referred to as "QC Groups") to ensure that only high-quality genotypes were retained for the analytic data set. QC metrics included completion rates by sample or locus, sample heterozygosity rate and duplicate concordance rate and standard thresholds for exclusion of data generated per array were applied. The results of 198 arrays from 153 different individuals were excluded. We also excluded individuals and loci with discordance rates greater than 1% after merging the genotypes generated from different arrays, resulting in exclusion of 5 individuals (2 ATBC, 1 CPSII and 2 PLCO). Assays from Illumina Hap1, Omni1, Omni2.5 arrays were harmonized based on the locus meta-data of 1000 Genomes June 2010 release and HapMap 3 February 2009 release. An additional 763 loci were excluded due to incompatible alleles (either matching directly or by reverse complementing) between our data and the public reference data.]]> 906 individuals were chosen from two clinical trials and 2 prospective cohorts, Alpha-Tocopherol, Beta-Carotene Cancer Prevention Study (ATBC), Cancer Prevention Study-II of the American Cancer Society (CPS II), the Prostate, Lung, Colon and Ovarian Cancer Prevention Trial (PLCO) and the Shanxi Upper Gastrointestinal Cancer Genetics Project (SHNX). All individuals were cancer-free and over the age of 55 at last ascertainment. Individuals of European ancestry were selected from ATBC, CPSII and PLCO; African Americans from PLCO; and East Asians from SHNX. Illumina, Inc. provided data files for 446 Coriell individuals from HapMap3, namely, CEU, TSI, JPT, CHB and YRI populations genotyped on Illumina Omni 2.5 array. For 74 SHNX individuals, genotype data were available for the Illumina Hap660 array 12 as well as the Omni 2.5 array. 95 African American samples from the Multi Ethnic Cohort (MEC) were genotyped at USC with the Illumina Hap1 and the Omni 2.5 arrays.]]>
创建时间:
2011-10-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作