five

SNP genotype matrix for GWAS and Machine Learning analyses

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/5564293
下载链接
链接失效反馈
官方服务:
资源简介:
SNP datasets used for GWAS and Machine Learning analyses All datasets come from the easyGWAS website: https://easygwas.ethz.ch/down/1/   === Horton et al. 2012 === 1307 Arabidopsis genotypes x 214,057 SNPs 1) In the form of a genotype matrix The file is called Horton2012.raw https://www.nature.com/articles/ng.1042 Preview of the first lines and columns: FID    Chr1_657_T    Chr1_3102_G    Chr1_4648_A    Chr1_4880_T    Chr1_5975_G    Chr1_6063_T    Chr1_6449_C 9381    2    2    2    0    0    0    0 9380    0    0    0    0    0    0    2 9378    2    2    2    0    0    0    0 9371    2    2    2    0    0    0    0 9367    0    0    0    2    0    0    0 9363    2    2    2    0    0    0    0 9356    0    2    2    0    0    0    0 9355    2    2    2    0    0    0    0 9354    2    2    2    0    0    0    0 ...etc... PLINK 1.9 was used to convert the .ped and .map file to a .raw format with:  plink --file original_data/genotype --recodeA --tab Genotypes are encoded as 0, 1 or 2 with: SNP SNP_A --- ----- A A -> 0 A C -> 1 C C -> 2 0 0 -> NA Then only the Family ID was kept (same as individual ID) and other columns (Paternal ID, Maternal ID, Sex, Phenotype) were removed. The corresponding PLINK manual page used is here: https://zzz.bwh.harvard.edu/plink/dataman.shtml#recode 1) In the form of set of files compatible with PLINK out of the box The archive file is called AtPolyDB_call_method_75_Horton2012.tar.gz and contains three files: genotype.ped: pedigree information from the 1307 ecotypes genotype.map: the SNP positions on the genome phenotypes.pheno: the phenotype value of the 1307 ecotypes
创建时间:
2023-03-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作