SARS-CoV-2 GISAID isolates (2020-05-05) genotyping VCF
收藏doi.org2025-03-27 收录
下载链接:
http://doi.org/10.17632/x4t94w9njt.1
下载链接
链接失效反馈官方服务:
资源简介:
VCF file containing filtered mutated sites in SARS-CoV-2 genomes obtained from GISAID Epi-CoV. The columns correspond to viral genome accession ID, nucleotide position in the genome, mutation ID (left blank in all rows), reference nucleotide, potential mutations (numbered in order of appearance), quality, filter, and information columns (all left blank), format (GT in all rows), column corresponding to reference genome (all 0, referring to reference nucleotide column), and columns corresponding to isolate genomes, with each row identifying the nucleotide in the POS column, and whether it is non-mutant (0), or if mutant, which mutation it carries (an integer value equal to or greater than 1, corresponding to the number of appearance of the mutation in the ALT column. The file is tab delimited, with 5892 rows including the names, and 11911 columns.
The file was generated to test the hypothesis whether mutations in the RdRp protein of SARS-CoV-2 significantly affect the mutation rate of the virus by examining their correlation to the mutation load of the membrane or envelope proteins. The results indicate that the most common mutation, 14408C>T, increases the mutation rate, while the other common mutations can lower the mutation rate. These results were obtained by examining the number of isolates with a standard nucleotide call disagreeing with the reference sequence for each mutated nucleotide, and separating them into categorical variables, to be analysed along with isolate date and location.
本数据集包含从GISAID Epi-CoV获取的SARS-CoV-2基因组中筛选后的突变位点,以VCF文件格式存储。文件中的列分别对应病毒基因组访问号、基因组中的核苷酸位置、突变ID(所有行均留空)、参考核苷酸、潜在突变(按出现顺序编号)、质量、筛选以及信息列(均留空),格式(所有行均为GT),对应参考基因组的列(所有值为0,指代参考核苷酸列),以及对应分离株基因组的列。每行识别POS列中的核苷酸,并指明其为非突变型(0)或突变型,以及携带的突变(一个等于或大于1的整数值,对应ALT列中突变的出现次数)。该文件以制表符分隔,包含5892行,其中包含名称,共有11911列。
该文件生成旨在验证SARS-CoV-2 RdRp蛋白中的突变是否显著影响病毒突变率,通过考察其与膜或包膜蛋白突变负荷的相关性来检验该假设。结果表明,最常见的突变14408C>T增加了突变率,而其他常见突变则可能降低突变率。这些结果是通过检查每个突变核苷酸中与参考序列不一致的标准核苷酸调用的数量,并将它们分为分类变量来获得的,这些变量将与分离株日期和地点一起进行分析。
提供机构:
Mendeley Data



