Evolutionary innovation through fusion of sequences from across the tree of life
收藏DataONE2025-10-27 更新2025-11-01 收录
下载链接:
https://search.dataone.org/view/sha256:5823cdcdca58d8a7f439d3452fac66d1e4d51ab6a35c439d77461fdf84abd364
下载链接
链接失效反馈官方服务:
资源简介:
We hypothesized that fusion of genes acquired via horizontal gene transfer (HGT) with endogenous sequences in arthropod genomes might generate what we call âHGT-chimerasâ: genes with regions of non-metazoan and metazoan descent in the same open reading frame. This dataset supports the study of these HGT-chimeras presented in our manuscript âEvolutionary innovation through fusion of sequences from across the tree of lifeâ. It includes input data and intermediate output files used in our HGT-chimera detection pipeline, as well as in the downstream bioinformatic characterization of these genes. The repository contains FASTA files of protein sequences, clustering results, phylogenetic trees, and tabular summaries of inferred HGT-chimeras, along with downstream analyses describing sequence molecular evolution (dN/dS), phylogenetic origin, gene expression, and domain architecture. Files are organized to correspond with steps in the associated GitHub pipeline, beginning with input clustering d..., , # Data from: Evolutionary innovation through fusion of sequences from across the tree of life
Dataset DOI: [10.5061/dryad.t1g1jwtdz](10.5061/dryad.t1g1jwtdz)
## Description of the data and file structure
Full details of data processing and analysis are described in the accompanying manuscript and [GitHub repository](https://github.com/rishabhrajkapoor/Arthropod-HGT-chimeras-2025/tree/main).
#### Files and variables
**mmseq_cluster_representatives_with_missing.fasta**
FASTA file of 610,359 proteins as input to the HGT-chimera detection pipeline. Obtained via MMseqs2 clustering of proteins from 319 RefSeq arthropod genome annotations supplemented with 11 proteins from the same annotations that were obtained in a previous pilot iteration of this pipeline. FASTA headers have been set as \"genome accession;protein accession\".
**round1_diamond_output.tar.gz**
Tabular output of DIAMOND BLASTp search of mmseq_cluster_representatives_with_missing.fasta vs NR, with standard DIAMOND BLASTp...,
创建时间:
2025-10-28



