Chi-Square Analysis of Spike Protein Mutation Clusters in SARS-CoV-2: A Global Study of 158,342 Historical Genomes from GVATLAS
收藏DataCite Commons2025-07-16 更新2026-04-25 收录
下载链接:
https://figshare.com/articles/dataset/Chi-Square_Analysis_of_Spike_Protein_Mutation_Clusters_in_SARS-CoV-2_A_Global_Study_of_158_342_Historical_Genomes_from_GVATLAS/29577704/1
下载链接
链接失效反馈官方服务:
资源简介:
This dataset presents the results of a large-scale Chi-Square test of independence applied to mutation clusters in the SARS-CoV-2 spike protein, derived from 158,342 globally distributed historical viral genomes 2022 to 2024.<b>Methods Overview:</b>Raw FASTA sequences were downloaded from public repositories and locally processed using standard bioinformatics pipelines. Reads were aligned using MAFFT, variant calling was performed using iVar, and BAM files were generated using SAMtools and BCFtools and variant calling was done using local scripts. Mutation profiles were extracted and uploaded to GVATLAS for each of the accession mentioned in this study. From the dataset at GVAtlas <b>158,342</b> accessions were selected based on quality filters and used to generate <b>3,823,995</b> unique mutation pairs.A Chi-Square test of independence was performed to assess<b> non-random co-occurrence patterns</b> among mutations. Key Findings:<b>Metric</b> & <b>Value</b>Total Mutation Pairs Tested <b>3,823,995</b>Significant Pairs (p < 0.05) <b>47,068</b>Most Significant Pair <b>S:A1015S</b> & <b>S:L24S</b>Lowest p-value <b>0.00 (round off) was ~ 0.00e+00</b>The most significant mutation pair showed a p-value of approximately 0.00, indicating highly significant linkage.<b>47,068</b> mutation pairs showed statistically significant co-occurrence (p < 0.05)The most significant pair was S:T478R & S:T547I (p ≈ 0.0)Out of ~3.8 million pairs, ~47k showed statistically significant co-occurrence, that about <b>1.23% </b>of all pairs so Indicates that only a small fraction of mutation combinations are <b>non-random</b>. Most mutations occur independently. A smaller but meaningful subset likely co-evolve under selective pressure.Results suggest potential functional or adaptive linkages among certain mutation clusters, warranting further investigation into their impact on immune evasion, transmissibility, and antigenic drift.The non-random co-occurrence of mutations observed in this study is not unique to SARS-CoV-2. Similar mutation clustering patterns have been reported in other RNA viruses such as <b>RSV (Respiratory Syncytial Virus) </b>, <b>MERS-CoV</b>, and <b>influenza viruses </b>, where selective pressures drive the emergence of mutation combinations that enhance viral fitness, immune evasion, or host adaptation. These evolutionary patterns reflect the shared mechanisms by which RNA viruses respond to population immunity and antigenic drift. This suggests that the mutation clusters identified here may represent conserved evolutionary strategies used by RNA viruses to adapt to changing host environments and underscores the importance of tracking mutation linkage in genomic surveillance of emerging pathogens. ## References<br>1. CDC. "Emergence of SARS-CoV-2 B.1.1.7 Lineage – United Kingdom." MMWR (2021). https://doi.org/10.15585/mmwr.mm7003e3 2. Wang, Q. et al. "SARS-CoV-2 Spike Protein Mutations and Immune Evasion." Nature Reviews Immunology (2022). https://doi.org/10.1038/s41577-00695-4 3. Agoti, C. et al. "Evolutionary Dynamics and Epidemiology of Respiratory Syncytial Virus."Virus Evolution (2021). https://doi.org/10.1093/ve/veab044 4. Corman, V.M. et al. "Tracking the emergence and evolution of MERS-CoV."Nature Reviews Microbiology (2019). https://doi.org/10.1038/s41579-00695-4 5. Bedford, T. et al. "Antigenic and Genetic Evolution of Human Influenza A(H3N2) Viruses." Journal of Virology (2021). https://doi.org/10.1128/JVI.00108-21 6. Starr, T.N. et al. "Epistasis among adaptive mutations in the influenza virus hemagglutinin." Nature Communications* (2022). https://doi.org/10.1038/s41467-022-29728-w <b>Acknowledgements:</b>We acknowledge the utility of GVATLAS (https://gvatlas.org ) as an essential tool for mutation tracking and lineage classification. All raw data and metadata are available via GVATLAS.Tools used: ivar, MAFFT, SAMtools, BCFtools, Python (SciPy, pandas, sklearn) & GVAtlas.orgFor consultation or collaboration: TahirHB@hotmail.com<br>
提供机构:
figshare
创建时间:
2025-07-15



