Chi-Square Analysis of Spike Protein Mutation Clusters in SARS-CoV-2: A Global Study of 158,342 Historical Genomes from GVATLAS
收藏Figshare2025-07-16 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/Chi-Square_Analysis_of_Spike_Protein_Mutation_Clusters_in_SARS-CoV-2_A_Global_Study_of_158_342_Historical_Genomes_from_GVATLAS/29577704/1
下载链接
链接失效反馈官方服务:
资源简介:
This dataset presents the results of a large-scale Chi-Square test of independence applied to mutation clusters in the SARS-CoV-2 spike protein, derived from 158,342 globally distributed historical viral genomes 2022 to 2024.<b>Methods Overview:</b>Raw FASTA sequences were downloaded from public repositories and locally processed using standard bioinformatics pipelines. Reads were aligned using MAFFT, variant calling was performed using iVar, and BAM files were generated using SAMtools and BCFtools and variant calling was done using local scripts. Mutation profiles were extracted and uploaded to GVATLAS for each of the accession mentioned in this study. From the dataset at GVAtlas <b>158,342</b> accessions were selected based on quality filters and used to generate <b>3,823,995</b> unique mutation pairs.A Chi-Square test of independence was performed to assess<b> non-random co-occurrence patterns</b> among mutations. Key Findings:<b>Metric</b> & <b>Value</b>Total Mutation Pairs Tested <b>3,823,995</b>Significant Pairs (p < 0.05) <b>47,068</b>Most Significant Pair <b>S:A1015S</b> & <b>S:L24S</b>Lowest p-value <b>0.00 (round off) was ~ 0.00e+00</b>The most significant mutation pair showed a p-value of approximately 0.00, indicating highly significant linkage.<b>47,068</b> mutation pairs showed statistically significant co-occurrence (p < 0.05)The most significant pair was S:T478R & S:T547I (p ≈ 0.0)Out of ~3.8 million pairs, ~47k showed statistically significant co-occurrence, that about <b>1.23% </b>of all pairs so Indicates that only a small fraction of mutation combinations are <b>non-random</b>. Most mutations occur independently. A smaller but meaningful subset likely co-evolve under selective pressure.Results suggest potential functional or adaptive linkages among certain mutation clusters, warranting further investigation into their impact on immune evasion, transmissibility, and antigenic drift.The non-random co-occurrence of mutations observed in this study is not unique to SARS-CoV-2. Similar mutation clustering patterns have been reported in other RNA viruses such as <b>RSV (Respiratory Syncytial Virus) </b>, <b>MERS-CoV</b>, and <b>influenza viruses </b>, where selective pressures drive the emergence of mutation combinations that enhance viral fitness, immune evasion, or host adaptation. These evolutionary patterns reflect the shared mechanisms by which RNA viruses respond to population immunity and antigenic drift. This suggests that the mutation clusters identified here may represent conserved evolutionary strategies used by RNA viruses to adapt to changing host environments and underscores the importance of tracking mutation linkage in genomic surveillance of emerging pathogens. ## References<br>1. CDC. "Emergence of SARS-CoV-2 B.1.1.7 Lineage – United Kingdom." MMWR (2021). https://doi.org/10.15585/mmwr.mm7003e3 2. Wang, Q. et al. "SARS-CoV-2 Spike Protein Mutations and Immune Evasion." Nature Reviews Immunology (2022). https://doi.org/10.1038/s41577-00695-4 3. Agoti, C. et al. "Evolutionary Dynamics and Epidemiology of Respiratory Syncytial Virus."Virus Evolution (2021). https://doi.org/10.1093/ve/veab044 4. Corman, V.M. et al. "Tracking the emergence and evolution of MERS-CoV."Nature Reviews Microbiology (2019). https://doi.org/10.1038/s41579-00695-4 5. Bedford, T. et al. "Antigenic and Genetic Evolution of Human Influenza A(H3N2) Viruses." Journal of Virology (2021). https://doi.org/10.1128/JVI.00108-21 6. Starr, T.N. et al. "Epistasis among adaptive mutations in the influenza virus hemagglutinin." Nature Communications* (2022). https://doi.org/10.1038/s41467-022-29728-w <b>Acknowledgements:</b>We acknowledge the utility of GVATLAS (https://gvatlas.org ) as an essential tool for mutation tracking and lineage classification. All raw data and metadata are available via GVATLAS.Tools used: ivar, MAFFT, SAMtools, BCFtools, Python (SciPy, pandas, sklearn) & GVAtlas.orgFor consultation or collaboration: TahirHB@hotmail.com<br>
本数据集呈现了针对新冠病毒(SARS-CoV-2)刺突蛋白突变簇开展的大规模卡方独立性检验(Chi-Square test of independence)结果,数据源自2022年至2024年间全球分布的158,342条历史病毒基因组序列。<b>方法概述:</b>原始FASTA序列从公共数据库下载后,通过标准生物信息学流程进行本地处理。序列比对采用MAFFT工具,变异检出使用iVar工具,BAM文件通过SAMtools与BCFtools生成,变异检出同时辅以本地脚本完成。提取本研究提及的所有登录号样本的突变谱并上传至GVATLAS。从GVATLAS的数据集内,基于质量过滤标准选取了<b>158,342</b>条登录号样本,由此生成了<b>3,823,995</b>组独特的突变对。本研究采用卡方独立性检验,以评估突变间的<b>非随机共现模式</b>。<b>主要发现:</b><b>统计指标</b>与<b>数值</b><br>总检测突变对:<b>3,823,995</b><br>显著突变对(p < 0.05):<b>47,068</b><br>最显著突变对:<b>S:A1015S</b>与<b>S:L24S</b><br>最低p值:<b>0.00(四舍五入后约为0.00e+00)</b><br>该最显著突变对的p值近似为0.00,提示二者存在高度显著的连锁关联。共有<b>47,068</b>组突变对呈现统计学显著的共现特征(p < 0.05)。另有最显著突变对为<b>S:T478R</b>与<b>S:T547I</b>(p ≈ 0.0)。<br>在约380万组突变对中,约4.7万组呈现统计学显著的共现特征,占总对数的<b>1.23%</b>,这表明仅极小部分突变组合呈现非随机分布特征,绝大多数突变独立发生。仅有少量但具备生物学意义的突变子集可能在选择压力下共同进化。<br>研究结果提示,部分突变簇之间存在潜在的功能或适应性关联,值得进一步探究其对免疫逃逸、传播能力以及抗原漂移的影响。<br>本研究中观察到的突变非随机共现现象并非新冠病毒独有。类似的突变簇集模式已在其他RNA病毒中被报道,如<b>呼吸道合胞病毒(RSV, Respiratory Syncytial Virus)</b>、<b>中东呼吸综合征冠状病毒(MERS-CoV)</b>以及<b>流感病毒</b>。在这些病毒中,选择压力会驱动能够提升病毒适配性、免疫逃逸能力或宿主适应性的突变组合出现。这些进化模式反映了RNA病毒应对群体免疫与抗原漂移的共有机制。这表明本研究鉴定出的突变簇可能代表了RNA病毒适应宿主环境变化的保守进化策略,同时也强调了在新发病原体基因组监测中追踪突变连锁关系的重要性。<br><br>## 参考文献<br>1. 美国疾病控制与预防中心(CDC). "Emergence of SARS-CoV-2 B.1.1.7 Lineage – United Kingdom." MMWR (2021). https://doi.org/10.15585/mmwr.mm7003e3 2. Wang, Q. et al. "SARS-CoV-2 Spike Protein Mutations and Immune Evasion." Nature Reviews Immunology (2022). https://doi.org/10.1038/s41577-00695-4 3. Agoti, C. et al. "Evolutionary Dynamics and Epidemiology of Respiratory Syncytial Virus."Virus Evolution (2021). https://doi.org/10.1093/ve/veab044 4. Corman, V.M. et al. "Tracking the emergence and evolution of MERS-CoV."Nature Reviews Microbiology (2019). https://doi.org/10.1038/s41579-00695-4 5. Bedford, T. et al. "Antigenic and Genetic Evolution of Human Influenza A(H3N2) Viruses." Journal of Virology (2021). https://doi.org/10.1128/JVI.00108-21 6. Starr, T.N. et al. "Epistasis among adaptive mutations in the influenza virus hemagglutinin." Nature Communications* (2022). https://doi.org/10.1038/s41467-022-29728-w <br><br><b>致谢:</b>本研究感谢GVATLAS(https://gvatlas.org )作为突变追踪与谱系分类核心工具的实用价值。所有原始数据与元数据均可通过GVATLAS获取。<br>所用工具:iVar、MAFFT、SAMtools、BCFtools、Python(SciPy、pandas、sklearn)及GVAtlas.org<br>咨询或合作联系:TahirHB@hotmail.com
提供机构:
Bhatti, Tahir
创建时间:
2025-07-15



