five

Underlying data for "Mapping glycoprotein structure reveals Flaviviridae evolutionary history"

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/10616317
下载链接
链接失效反馈
官方服务:
资源简介:
This repository houses the underlying data for "Mapping glycoprotein structure reveals Flaviviridae evolutionary history", authored by Jonathon C.O. Mifsud, Spyros Lytras, Michael R. Oliver, Kamilla Toon, Vincenzo A. Costa, Edward C. Holmes, and Joe Grove. The dataset is organised into several directories: - flaviviridae_foldseek_output: Contains the Foldseek output and parsing scripts to extract the lowest e-value hit for each taxa and reference - flaviviridae_structure_blocks: Contains the Flaviviridae structures generated by ColabFold and ESMFold. Structures are organised by taxa and numbered based on their block number. Polyprotein sequences were broken into 300 residue blocks, each overlapping by 100 residues. Numbering starts at Block_0 (residues 1-300) and continue sequentially (e.g. Block_1 = residues 100-400, Block_2 = residues 200-500, ...). This dataset constitutes the Flaviviridae protein foldome referred to in the main text. - foldseek_reference_structures: Contains all structures used as references in FoldSeek analysis, including the Bole Tick Virus 4 proteins described in figure 3. - glycoprotein_structural_alignments_and_trees: Contains all files to replicate the trees for the E, E1 and E2 glycoproteins. The underlying code can be found in structural_alignments_code.ipynb This directory contains complete glycoprotein structure predictions (refolded_fullglyco). - ns5b_alignments_and_trees: Contains all alignment files, both trimmed and untrimmed, for NS5b RdRp. These include variations of alignments using different parameters, methods and those used in the stratified MUSCLE analysis. Also includes related scripts. - sequence_benchmarks: Contains the files and scripts underlying the sequence benchmark analysis - sequences: Holds sequence files including full genome sequences of Flaviviridae in .fasta formats, novel sequences identified in our study, and protein sequences extracted for alignment purposes. It also contains the script for creating the sequence blocks used in main analyses. - stratified_MUSCLE_analysis: Contains the files and scripts to replicate the stratified MUSCLE analysis. Underlying tree files are located in ns5b_alignments_and_trees  - t2rnase_alignments_and_trees: Contains all alignment and tree files, both trimmed and untrimmed, for t2rnase. - tables: Provides metadata tables, including interpro domain annotations, RNase T2 analyses summaries, phylogenetic model finder for the glycoprotein structural phylogenetics, and novel viruses identified through data mining. - workflows: PDF flowchart diagrams illustrating the workflows behind the main pieces of analysis performed in our study. To orientate readers the diagrams refer to underlying data and scripts (as included in this repository), and resultant figure panels in the paper. Note, structure and sequence names are prefixed by a four letter code denoting the sub-clade/classification of the taxa. Flavi-Jingmen CladeFJTB = Flavi-Jingmen Tick-BorneFJMB = Flavi-Jingmen Mosquito-BorneFJNV = Flavi-Jingmen No Known VectorFJIS = Flavi-Jingmen Insect OnlyFJAF = Flavi-Jingmen Aquatic FlavivirusFJFL = Flavi-Jingmen Flavi-LikeFJJI = Flavi-Jingmen JingmenvirusFJUN = Flavi-Jingmen Unclassified Pesti-LGF CladePLLG = Pesti-LGF Large Genome FlavivirusPLPV = Pesti-LGF PestivirusPLUN = Pesti-LGF Unclassified Hepaci-Pegi CladeHPPV = Hepaci-Pegi PegivirusHPHV = Hepaci-Pegi HepacivirusHPUN = Hepaci-Pegi Unclassified TOMB = Tombusvirus out group For any queries, please contact the corresponding author at joe.grove@glasgow.ac.uk
创建时间:
2024-08-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作