SARS-CoV-2 Spike Mutation Clusters Identified via Large-Scale Statistical Linkage Analysis,Insights into Epistatic Interactions and Evolutionary Pathways
收藏Figshare2025-07-15 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Chi-Square_Analysis_of_Spike_Protein_Mutation_Clusters_in_SARS-CoV-2_A_Global_Study_of_158_342_Historical_Genomes_from_GVATLAS/29577704
下载链接
链接失效反馈官方服务:
资源简介:
This dataset presents the results of a large-scale mutation co-occurrence analysis of the SARS-CoV-2 spike protein, based on 158,342 high-quality genomes collected between 2022 and 2024. Mutation profiles were extracted using standard bioinformatics pipelines (MAFFT, iVar, SAMtools, BCFtools) and validated using the GVATLAS platform. A total of 3,823,995 unique mutation pairs were tested for non-random co-occurrence using the Chi-Square test of independence . Of these, 47,068 pairs showed statistically significant co-occurrence (p ), representing approximately 1.23% of all pairs , indicating that while most mutations occur independently, a small but meaningful subset co-evolves under selective pressure.The most significant pair was S:T478R & S:T547I , with a p-value of approximately 0.00 , showing strong linkage. Other top co-occurring pairs include S:A1015S & S:L24S , suggesting potential functional or adaptive relationships.These findings highlight potential mutation clusters that may influence immune evasion , transmissibility , and antigenic drift , and align with similar patterns observed in other RNA viruses such as RSV , MERS-CoV , and influenza .We have since expanded this dataset to include 6.8 million accessions, enabling an even more comprehensive view of mutation dynamics across time and geographies. This large-scale extension allows deeper insights into the evolution of SARS-CoV-2 and provides a robust resource for future research on viral adaptation.SARS-CoV-2 Spike Mutation Co-Occurrence SummaryBased on 5,525,457 accessionsTotal mutations observed: 109825409Unique mutations: 7,806PAIRWISE ANALYSIS-----------------Total mutation pairs analyzed: 540,368Significant pairs (p 249,836TOP MUTATIONS (by frequency)-------------------------------S:D614G 5,490,427S:T478K 4,383,840S:P681H 3,213,690S:N501Y 3,166,083S:H655Y 2,973,639S:N679K 2,962,488S:N969K 2,957,849S:Q954H 2,945,931S:G142D 2,935,728S:D796Y 2,922,508TOP CO-OCCURRING MUTATION PAIRS (by count)------------------------------------------------------------S:D614G | S:T478K | 4,370,242S:D614G | S:P681H | 3,204,312S:D614G | S:N501Y | 3,156,208S:D614G | S:H655Y | 2,967,845S:D614G | S:N679K | 2,954,582S:D614G | S:N969K | 2,947,503S:N969K | S:Q954H | 2,942,025S:N501Y | S:P681H | 2,939,596S:N679K | S:N969K | 2,939,375S:D614G | S:Q954H | 2,935,645STRONGEST CO-OCCURRING MUTATION PAIRS (by Cramér's V)------------------------------------------------------------S:N969K | S:Q954H | 0.702040S:S373P | S:S375F | 0.701917S:L981F | S:N856K | 0.701013S:L981F | S:T547K | 0.700331S:A570D | S:D1118H | 0.700198S:L452W | S:N481K | 0.699967S:A570V | S:P621S | 0.699760S:N856K | S:T547K | 0.698985S:A570V | S:E554K | 0.697428S:A570D | S:S982A | 0.696944-------------------------------------------------Interpretation:Over 249,836 significant mutation pairs indicate that while many mutations occur independently, a large number show non-random co-evolution patterns .Many top pairs involve mutations located in key domains like:Receptor Binding Domain (RBD) – e.g., N501Y, T478KFurin cleavage site – e.g., P681HNTD and other antigenic sites – e.g., G142D, S371LThis supports the hypothesis that selective immune pressure and viral adaptation drive the emergence of specific mutation combinations.For inquiries regarding genomic surveillance and collaborative research, please contact:TahirHB@gotmail.com
创建时间:
2025-07-15



