five

VYD2311+SARS-CoV-2 Co-occurrence Mutation Atlas (2019 - 2025): 3M+ Genomes, 158M Mutations - Full Pandemic Surveillance Dataset

收藏
Zenodo2025-11-29 更新2026-05-26 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.17760295
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset provides a comprehensive, amino acid-level co-occurrence analysis of mutations in 3,064,039 VYD2311+ SARS-CoV-2 genomes (defined by the presence of ≥1 of: R346T, S371F, K444T, N460K, F486P in Spike) collected globally from 2019 to 2025. Generated using a scalable, chunked bioinformatics pipeline, the analysis: Extracts all non-synonymous mutations across 10 viral genes (S, N, M, E, ORF3a, ORF6, ORF7a, ORF7b, ORF8, ORF1a) relative to Wuhan-Hu-1 reference.Identifies 158.2 million co-occurring mutations alongside VYD2311 signature profiles.Includes per-genome constellation metadata (1/5 to 5/5 VYD2311 mutations) for stratified analysis. Processes >9 million translated sequences. Top recurrent mutations include S:D614G (99.7%), ORF1a:P3395H (99.3%), and S:S371F (98.6%), reflecting dominant pandemic lineages. Global Sumamry:  CORRECTED GLOBAL SUMMARY: VYD2311 Co-occurrence (2019–2025)============================================================== Total VYD2311+ genomes: 3,064,039Total co-occurring mutations: 158,200,000 --- Corrected Constellation Distribution (Unique Genomes) ---  1/5: 2,025,789 (66.11%)  2/5: 177,921 (5.81%)  3/5: 293,452 (9.58%)  4/5: 565,958 (18.47%)  5/5: 919 (0.03%) --- Top 20 Co-occurring Mutations ---   S:614 D>G: 3,053,701 (99.7%) [Gene: S]  ORF1a:3395 P>H: 3,042,849 (99.3%) [Gene: ORF1a]  ORF3a:223 T>I: 3,037,146 (99.1%) [Gene: ORF3a]  E:9 T>I: 3,035,040 (99.1%) [Gene: E]  ORF1a:1307 G>S: 3,034,541 (99.0%) [Gene: ORF1a]  M:63 A>T: 3,032,166 (99.0%) [Gene: M]  S:679 N>K: 3,025,940 (98.8%) [Gene: S]  ORF1a:3255 T>I: 3,025,556 (98.7%) [Gene: ORF1a]  S:373 S>P: 3,025,251 (98.7%) [Gene: S]  S:969 N>K: 3,024,943 (98.7%) [Gene: S]  S:954 Q>H: 3,021,562 (98.6%) [Gene: S]  S:371 S>F: 3,021,095 (98.6%) [Gene: S]  S:655 H>Y: 3,020,598 (98.6%) [Gene: S]  S:405 D>N: 3,020,595 (98.6%) [Gene: S]  S:375 S>F: 3,020,513 (98.6%) [Gene: S]  N:13 P>L: 3,020,414 (98.6%) [Gene: N]  S:376 T>A: 3,017,809 (98.5%) [Gene: S]  ORF1a:3027 L>F: 3,001,698 (98.0%) [Gene: ORF1a]  S:796 D>Y: 3,000,679 (97.9%) [Gene: S]  ORF1a:135 S>R: 3,000,530 (97.9%) [Gene: ORF1a] Files: 95 × .tar.zst archives (Zenodo-optimized, ~70–150 MiB each) containing TSV records of mutation co-occurrences.Global and per-chunk summaries with constellation distributions and top mutation frequencies.Attribution and methodology metadata.Ideal for studies on antibody escape, variant evolution, epistasis, and therapeutic resistance in SARS-CoV-2. Generated by: Tahir Bhatti Comments and queries welcomed: TahirHB@hotmail.com
提供机构:
Zenodo
创建时间:
2025-11-29
二维码
社区交流群
二维码
科研交流群
商业服务