Evolutionary innovation through fusion of sequences from across the tree of life
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.t1g1jwtdz
下载链接
链接失效反馈官方服务:
资源简介:
We hypothesized that fusion of genes acquired via horizontal gene transfer (HGT) with endogenous sequences in arthropod genomes might generate what we call “HGT-chimeras”: genes with regions of non-metazoan and metazoan descent in the same open reading frame. This dataset supports the study of these HGT-chimeras presented in our manuscript “Evolutionary innovation through fusion of sequences from across the tree of life”. It includes input data and intermediate output files used in our HGT-chimera detection pipeline, as well as in the downstream bioinformatic characterization of these genes. The repository contains FASTA files of protein sequences, clustering results, phylogenetic trees, and tabular summaries of inferred HGT-chimeras, along with downstream analyses describing sequence molecular evolution (dN/dS), phylogenetic origin, gene expression, and domain architecture. Files are organized to correspond with steps in the associated GitHub pipeline, beginning with input clustering data (mmseq_cluster_representatives_with_missing.fasta) and concluding with analyses of representative HGT-chimeras highlighted in the manuscript’s figures. These data can be reused to validate our findings, extend analyses of discovered HGT-chimeras, or adapt the included pipeline for other genomic datasets. No ethical or legal restrictions apply to the data, which are derived from available genome assemblies and annotation data on NCBI.
我们提出如下假说:通过水平基因转移(Horizontal Gene Transfer, HGT)获得的基因,与节肢动物基因组内的内源序列发生融合,可形成我们称之为“HGT嵌合体(HGT-chimera)”的基因——即在同一个开放阅读框(Open Reading Frame, ORF)中同时包含非动物类群与动物类群起源序列的基因。本数据集支持对我们在论文《演化创新:融合生命之树各分支的序列》中所探讨的HGT嵌合体开展相关研究。本数据集包含用于HGT嵌合体检测流程的输入数据与中间输出文件,以及用于对这些基因开展下游生物信息学表征分析的相关文件。数据集仓库包含蛋白质序列的FASTA文件、聚类结果、系统发育树、预测HGT嵌合体的表格汇总,以及用于分析序列分子进化(dN/dS)、系统发育起源、基因表达与结构域组成的下游分析结果。文件按照关联的GitHub流程步骤进行组织,起始文件为输入聚类数据(mmseq_cluster_representatives_with_missing.fasta),终止文件为论文配图中重点展示的代表性HGT嵌合体分析结果。这些数据可用于复现我们的研究结论、拓展已发现HGT嵌合体的相关分析,或对本数据集附带的流程进行适配以应用于其他基因组数据集。本数据集的数据均来自NCBI公开可用的基因组组装与注释数据,无任何伦理或法律使用限制。
创建时间:
2025-10-27



