VP24, VP35, and Glycoprotein Mutation Burden Across 28 Public Zaire ebolavirus Genomes Span=2014 - 2023
收藏Figshare2025-11-28 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/VP24_VP35_and_Glycoprotein_Mutation_Burden_Across_28_Public_Zaire_ebolavirus_Genomes_Span_2014_-_2023/30738113
下载链接
链接失效反馈官方服务:
资源简介:
This dataset documents a systematic attempt to perform protein-level mutation burden analysis on three key Zaire ebolavirus proteins such as VP24, VP35, and glycoprotein (GP) using 28 publicly available genome assemblies from global surveillance efforts (2014 - 2023). Despite a robust computational pipeline designed to extract coding sequences (CDS), translate, and align to reference proteins (AHX24653.1, AAD14582.1, AAD14585.1), consistent extraction of full-length VP24 and GP was not possible due to pervasive 5′-end truncation in non-reference submissions.All non-KJ660346 genomes exhibited a uniform 84-nucleotide deletion at the 5′ terminus, shifting genomic coordinates and rendering standard CDS annotations invalid. While VP35 (located near the 5′ end) could be partially analyzed yielding artifactual ~94% divergence when misaligned VP24 and GP extraction failed across all truncated genomes, despite their biological presence in the sequence data. Only a direct comparison of two full-length protein sequences (Makona 2014 vs. Mayinga 1976) revealed a single conservative mutation (K212M in VP24), visualized structurally using PDB 5F1B.This dataset includes:Raw mutation burden TSVs (including failed extractions),Per-sequence FASTA and QC reports,Structural visualizations / images (PyMOL etc.),Metadata mappings and coordinate logs.We openly share these results not as a successful analysis, but as a cautionary audit: global genomic surveillance data, while abundant, often lacks the completeness and annotation consistency required for cross-isolate protein-level inference. Reproducible evolutionary virology demands not just more sequences, but better-structured, full-length, and accurately annotated genomes.No biological interpretation is provided. All outputs reflect direct computational results using best-practice tools under documented data constraints.Study by: TahirHB@GVAtlas.Org
创建时间:
2025-11-28



