five

Coat protein (CP) and trimmed replication-associated protein (Rep) amino acid alignments, phylogenetic analyses, and associated metadata for ICTV-approved begomovirus RefSeq species exemplars

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/8338471
下载链接
链接失效反馈
官方服务:
资源简介:
DATA RETRIEVAL Annotated begomovirus coding sequences corresponding to each begomovirus species exemplar with a RefSeq accession number listed in the ICTV Virus Metadata Resource (VMR #18, 2021-10-19, https://ictv.global/vmr) were downloaded from GenBank in protein FASTA file format. CP and Rep amino acid sequences were extracted and split into separate data sets for analysis. We confirmed the identity of misannotated ORF products by performing a BLAST search. For exemplar sequences missing ORF annotations (listed in metadata spreadsheet), ORFfinder (https://www.ncbi.nlm.nih.gov/orffinder/) was used to identify CP and Rep ORFs that were subsequently translated and added to each corresponding data set after BLAST confirmation. ALIGNMENTS Multiple sequence alignments were constructed using the MUSCLE method (Edgar, 2004) as implemented in MEGA 11 (Tamura et al., 2021) and manually corrected using AliView v1.26 (Larsson, 2014). After an initial alignment inspection, exemplars with either severely truncated (i.e., length < 50% of the average length of the protein) or very divergent (i.e., causing us to doubt protein homology) CP or Rep sequences were excluded from the data set. Due to the difficulties in aligning the Rep sequences at the N- and C- terminal ends, the Rep alignment was trimmed to eliminate all residues prior to the iteron related domain (i.e., the known Rep functional region closest to the Rep start (Arguello-Astorga & Ruiz-Medrano, 2001)) in the N-terminus and after a conserved geminivirus motif found near the C-terminus, which corresponds to where other circular, Rep-encoding single-stranded DNA viruses possess an arginine finger motif (Kazlauskas et al., 2019; Krupovic et al., 2020). In total, our CP and Rep data sets contained amino acid sequences from 432 begomovirus species exemplars that met our inclusion criteria. PHYLOGENETIC ANALYSIS Maximum likelihood (ML) trees were inferred with IQ-Tree v2.0.7 (Minh et al., 2020) using the best fitting substitution model identified by the built-in ModelFinder feature (Kalyaanamoorthy et al., 2017). Tree inference was performed with 3000 ultrafast bootstrap (UFBoot) replicates, a perturbation strength of 0.2 and a stopping rule requiring an iteration interval of 500 iterations between unsuccessful improvements to the local optimum. The -bnni flag was enabled to reduce the risk of overestimating branch supports with UFBoot due to severe model violations. The provided phylogenies in NEXUS format are midpoint-rooted and branches are colored based on traditional begomovirus geographic groupings: exemplars sampled in the Americas in orange and exemplars sampled in the 'Africa, Asia, Europe and Oceania' (AAEO) region in blue.  METADATA Metadata associated with each ICTV-approved species exemplar (n=445) – including country of isolation, geographic designation (i.e., AAEO/Americas), genome segmentation (i.e., monopartite/bipartite), presence/absence of V2/AV2 gene and length of genome/DNA-A segments – are included. Exemplars not incorporated into the other analyses are highlighted in red on the spreadsheet.
创建时间:
2024-09-27
二维码
社区交流群
二维码
科研交流群
商业服务