SNP genotype matrix for GWAS and Machine Learning analyses
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/5564293
下载链接
链接失效反馈官方服务:
资源简介:
SNP datasets used for GWAS and Machine Learning analyses
All datasets come from the easyGWAS website: https://easygwas.ethz.ch/down/1/
=== Horton et al. 2012 ===
1307 Arabidopsis genotypes x 214,057 SNPs
1) In the form of a genotype matrix
The file is called Horton2012.raw
https://www.nature.com/articles/ng.1042
Preview of the first lines and columns:
FID Chr1_657_T Chr1_3102_G Chr1_4648_A Chr1_4880_T Chr1_5975_G Chr1_6063_T Chr1_6449_C
9381 2 2 2 0 0 0 0
9380 0 0 0 0 0 0 2
9378 2 2 2 0 0 0 0
9371 2 2 2 0 0 0 0
9367 0 0 0 2 0 0 0
9363 2 2 2 0 0 0 0
9356 0 2 2 0 0 0 0
9355 2 2 2 0 0 0 0
9354 2 2 2 0 0 0 0
...etc...
PLINK 1.9 was used to convert the .ped and .map file to a .raw format with:
plink --file original_data/genotype --recodeA --tab
Genotypes are encoded as 0, 1 or 2 with:
SNP SNP_A
--- -----
A A -> 0
A C -> 1
C C -> 2
0 0 -> NA
Then only the Family ID was kept (same as individual ID) and other columns (Paternal ID, Maternal ID, Sex, Phenotype) were removed.
The corresponding PLINK manual page used is here: https://zzz.bwh.harvard.edu/plink/dataman.shtml#recode
1) In the form of set of files compatible with PLINK out of the box
The archive file is called AtPolyDB_call_method_75_Horton2012.tar.gz and contains three files:
genotype.ped: pedigree information from the 1307 ecotypes
genotype.map: the SNP positions on the genome
phenotypes.pheno: the phenotype value of the 1307 ecotypes
创建时间:
2023-03-22



