VYD2311+SARS-CoV-2 Co-occurrence Mutation Atlas (2019 - 2025): 3M+ Genomes, 158M Mutations - Full Pandemic Surveillance Dataset
收藏Zenodo2025-11-29 更新2026-05-26 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.17760295
下载链接
链接失效反馈官方服务:
资源简介:
This dataset provides a comprehensive, amino acid-level co-occurrence analysis of mutations in 3,064,039 VYD2311+ SARS-CoV-2 genomes (defined by the presence of ≥1 of: R346T, S371F, K444T, N460K, F486P in Spike) collected globally from 2019 to 2025.
Generated using a scalable, chunked bioinformatics pipeline, the analysis:
Extracts all non-synonymous mutations across 10 viral genes (S, N, M, E, ORF3a, ORF6, ORF7a, ORF7b, ORF8, ORF1a) relative to Wuhan-Hu-1 reference.Identifies 158.2 million co-occurring mutations alongside VYD2311 signature profiles.Includes per-genome constellation metadata (1/5 to 5/5 VYD2311 mutations) for stratified analysis.
Processes >9 million translated sequences.
Top recurrent mutations include S:D614G (99.7%), ORF1a:P3395H (99.3%), and S:S371F (98.6%), reflecting dominant pandemic lineages.
Global Sumamry:
CORRECTED GLOBAL SUMMARY: VYD2311 Co-occurrence (2019–2025)==============================================================
Total VYD2311+ genomes: 3,064,039Total co-occurring mutations: 158,200,000
--- Corrected Constellation Distribution (Unique Genomes) --- 1/5: 2,025,789 (66.11%) 2/5: 177,921 (5.81%) 3/5: 293,452 (9.58%) 4/5: 565,958 (18.47%) 5/5: 919 (0.03%)
--- Top 20 Co-occurring Mutations ---
S:614 D>G: 3,053,701 (99.7%) [Gene: S] ORF1a:3395 P>H: 3,042,849 (99.3%) [Gene: ORF1a] ORF3a:223 T>I: 3,037,146 (99.1%) [Gene: ORF3a] E:9 T>I: 3,035,040 (99.1%) [Gene: E] ORF1a:1307 G>S: 3,034,541 (99.0%) [Gene: ORF1a] M:63 A>T: 3,032,166 (99.0%) [Gene: M] S:679 N>K: 3,025,940 (98.8%) [Gene: S] ORF1a:3255 T>I: 3,025,556 (98.7%) [Gene: ORF1a] S:373 S>P: 3,025,251 (98.7%) [Gene: S] S:969 N>K: 3,024,943 (98.7%) [Gene: S] S:954 Q>H: 3,021,562 (98.6%) [Gene: S] S:371 S>F: 3,021,095 (98.6%) [Gene: S] S:655 H>Y: 3,020,598 (98.6%) [Gene: S] S:405 D>N: 3,020,595 (98.6%) [Gene: S] S:375 S>F: 3,020,513 (98.6%) [Gene: S] N:13 P>L: 3,020,414 (98.6%) [Gene: N] S:376 T>A: 3,017,809 (98.5%) [Gene: S] ORF1a:3027 L>F: 3,001,698 (98.0%) [Gene: ORF1a] S:796 D>Y: 3,000,679 (97.9%) [Gene: S] ORF1a:135 S>R: 3,000,530 (97.9%) [Gene: ORF1a]
Files:
95 × .tar.zst archives (Zenodo-optimized, ~70–150 MiB each) containing TSV records of mutation co-occurrences.Global and per-chunk summaries with constellation distributions and top mutation frequencies.Attribution and methodology metadata.Ideal for studies on antibody escape, variant evolution, epistasis, and therapeutic resistance in SARS-CoV-2.
Generated by: Tahir Bhatti Comments and queries welcomed: TahirHB@hotmail.com
提供机构:
Zenodo
创建时间:
2025-11-29



