five

SARS-CoV-2 GISAID isolates (2020-05-05) genotyping VCF

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://data.mendeley.com/datasets/x4t94w9njt
下载链接
链接失效反馈
官方服务:
资源简介:
VCF file containing filtered mutated sites in SARS-CoV-2 genomes obtained from GISAID Epi-CoV. The columns correspond to viral genome accession ID, nucleotide position in the genome, mutation ID (left blank in all rows), reference nucleotide, potential mutations (numbered in order of appearance), quality, filter, and information columns (all left blank), format (GT in all rows), column corresponding to reference genome (all 0, referring to reference nucleotide column), and columns corresponding to isolate genomes, with each row identifying the nucleotide in the POS column, and whether it is non-mutant (0), or if mutant, which mutation it carries (an integer value equal to or greater than 1, corresponding to the number of appearance of the mutation in the ALT column. The file is tab delimited, with 5892 rows including the names, and 11911 columns. The file was generated to test the hypothesis whether mutations in the RdRp protein of SARS-CoV-2 significantly affect the mutation rate of the virus by examining their correlation to the mutation load of the membrane or envelope proteins. The results indicate that the most common mutation, 14408C>T, increases the mutation rate, while the other common mutations can lower the mutation rate. These results were obtained by examining the number of isolates with a standard nucleotide call disagreeing with the reference sequence for each mutated nucleotide, and separating them into categorical variables, to be analysed along with isolate date and location.
创建时间:
2020-05-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作