five

VP24, VP35, and Glycoprotein Mutation Burden Across 28 Public Zaire ebolavirus Genomes Span=2014 - 2023

收藏
DataCite Commons2025-11-28 更新2026-04-25 收录
下载链接:
https://figshare.com/articles/dataset/VP24_VP35_and_Glycoprotein_Mutation_Burden_Across_28_Public_Zaire_ebolavirus_Genomes_Span_2014_-_2023/30738113/1
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset documents a systematic attempt to perform protein-level mutation burden analysis on three key Zaire ebolavirus proteins such as <b>VP24</b>, <b>VP35</b>, and glycoprotein (<b>GP</b>) using <b>28</b> publicly available genome assemblies from global surveillance efforts (2014 - 2023). Despite a robust computational pipeline designed to extract coding sequences (CDS), translate, and align to reference proteins (AHX24653.1, AAD14582.1, AAD14585.1), consistent extraction of full-length VP24 and GP was not possible due to pervasive 5′-end truncation in non-reference submissions.<br>All non-KJ660346 genomes exhibited a uniform 84-nucleotide deletion at the 5′ terminus, shifting genomic coordinates and rendering standard CDS annotations invalid. <br>While VP35 (located near the 5′ end) could be partially analyzed yielding artifactual ~94% divergence when misaligned VP24 and GP extraction failed across all truncated genomes, despite their biological presence in the sequence data. Only a direct comparison of two full-length protein sequences (Makona 2014 vs. Mayinga 1976) revealed a single conservative mutation (K212M in VP24), visualized structurally using PDB 5F1B.<br><b>This dataset includes</b>:<br>Raw mutation burden TSVs (including failed extractions),Per-sequence FASTA and QC reports,Structural visualizations / images (PyMOL etc.),Metadata mappings and coordinate logs.We openly share these results not as a successful analysis, but as a cautionary audit: global genomic surveillance data, while abundant, often lacks the completeness and annotation consistency required for cross-isolate protein-level inference. Reproducible evolutionary virology demands not just more sequences, but better-structured, full-length, and accurately annotated genomes.No biological interpretation is provided. All outputs reflect direct computational results using best-practice tools under documented data constraints.Study by: TahirHB@GVAtlas.Org
提供机构:
figshare
创建时间:
2025-11-28
二维码
社区交流群
二维码
科研交流群
商业服务