Plink files from NYGC 1KG to be used for Cancer GWAS project QBIO475
收藏DataONE2025-10-30 更新2025-11-01 收录
下载链接:
https://search.dataone.org/view/sha256:fe209d336a621cb3cae203fc368bb5cb16da3c94b716ff7187f97c563ec3d249
下载链接
链接失效反馈官方服务:
资源简介:
Cancer risk is influenced by genetic variation and environment. However, allele frequencies for many cancer-associated variants remain poorly characterized across global populations. To address this gap and provide a framework for teaching population genetics using real human genomic data, undergraduate researchers analyzed population-level allele frequency variation for a curated set of cancer-associated single nucleotide polymorphisms (SNPs). We assembled a set of variants from the GWAS Catalog databases based on reported associations with hereditary cancers, including breast, ovarian, colorectal, and lung cancer. Allele frequencies were extracted from the 1000 Genomes Project across five major continental groups. Students quantified differences in allele frequencies across these populations. The dataset includes curated PLINK files that may be used for future research or educational purposes in human genetics and bioinformatics.
, , , # Plink files from NYGC 1KG to be used for Cancer GWAS project QBIO475
Dataset DOI: [10.5061/dryad.hhmgqnkq7](https://doi.org/10.5061/dryad.hhmgqnkq7)
## Description of the data and file structure
Plink dataset for QBIO475 Cancer GWAS project:\
Biallelic SNPs from 1000 Genomes NYGC hg38\
Filtered to unrelated individuals\
Highly autozygous individuals also removed\
Regions with poor mappability removed\
Strict mask applied\
Label with super population
Plink files are:\
allPops.allChroms.snps.QCIndivsForAuto_UnrelsOnly_superPopLabel.bed, allPops.allChroms.snps.QCIndivsForAuto_UnrelsOnly_superPopLabel.bim,\
allPops.allChroms.snps.QCIndivsForAuto_UnrelsOnly_superPopLabel.fam
## Access information
Other publicly accessible locations of the data:
* N/A
Data was derived from the following sources:
* [https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased/](https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504...,
创建时间:
2025-10-31



