HK97-fold proteins evolutionary analyses (sequences, structures, and phylogenies)
收藏DataCite Commons2026-04-28 更新2026-04-25 收录
下载链接:
https://figshare.com/articles/dataset/HK97-fold_proteins_evolutionary_analyses_sequences_structures_and_phylogenies_/30948881
下载链接
链接失效反馈官方服务:
资源简介:
This item contains sequence- and structure-based resources supporting the evolutionary analyses of HK97-fold major capsid proteins (MCPs) and encapsulins reported in our manuscript. The work was carried out in the Luque Lab (Department of Biology, University of Miami). Files include the environmental MCP sequence set, reference identifiers and reference MCP sequences, maximum-likelihood phylogenetic inference outputs, protein structure models, and structure-based phylogenetic results spanning viruses and related HK97-fold systems (encapsulins, PICIs, GTAs, and environmental viral-like entities).<b>Contents:</b><b>data_file_S10_environmental_HK97_proteins.faa</b><br>FASTA file of amino acid sequences for all HK97-fold major capsid proteins (MCPs) detected in environmental viral genomes.<br><b>data_file_S11_environmental_MCPs_trees.zip</b><br>IQ-TREE outputs for environmental HK97 MCPs, including maximum-likelihood trees, model statistics, and TreeCluster groupings at varying distance thresholds.<br><b>data_file_S12_HK97_reference_identifiers.txt</b><br>List of protein and genome identifiers used in the reference HK97-fold dataset, including entries for encapsulins, PICIs, GTAs, and phages.<br><b>data_file_S13_IDs_and_encapsulin_enrichment.csv</b><br>List of HK97-fold protein sequence identifiers (environmental contigs, reference viruses and encapsulins), and phage coat and encapsulin coverage per protein.<br><b>data_file_S14_HK97_protein_structures.zip</b><br>Protein structure models used for structural phylogenetics. Includes both experimentally resolved structures (PDB) and AlphaFold-predicted models with confidence assessments.<br><b>data_file_S15_HK97_large_scale_ml3di_structural_tree.txt</b><br>Newick tree file of structure-based phylogenetic tree (Foldtree 3Di encodings + Maximum Likelihood inference) of HK97-fold proteins across viruses, encapsulins, PICIs, GTAs, and viral environmental contigs.<br><b>data_file_S16_HK97_large_scale_sequence_tree.txt</b><br>Newick tree file of sequence-based phylogenetic tree of HK97-fold proteins across viruses, encapsulins, PICIs, GTAs, and viral environmental contigs.<br><b>data_file_S17_tail_proteins_non_encapsulin_enriched_genomes.zip</b><br>Annotation summaries for tail-related proteins detected in DTR-containing twilight-zone genomes lacking encapsulin domains in their HK97 MCPs.<br><b>Notes on reuse:</b> These are processed outputs used in the manuscript’s evolutionary analyses. Please cite the associated manuscript and this Figshare item DOI when reusing these resources. For third-party inputs (e.g., PDB, RefSeq, AlphaFold models, and external tools/databases), please follow their respective citation and usage guidance.
提供机构:
figshare
创建时间:
2025-12-25



