five

Evolutionary innovation through fusion of sequences from across the tree of life

收藏
DataONE2025-10-27 更新2025-11-01 收录
下载链接:
https://search.dataone.org/view/sha256:5823cdcdca58d8a7f439d3452fac66d1e4d51ab6a35c439d77461fdf84abd364
下载链接
链接失效反馈
官方服务:
资源简介:
We hypothesized that fusion of genes acquired via horizontal gene transfer (HGT) with endogenous sequences in arthropod genomes might generate what we call “HGT-chimeras”: genes with regions of non-metazoan and metazoan descent in the same open reading frame. This dataset supports the study of these HGT-chimeras presented in our manuscript “Evolutionary innovation through fusion of sequences from across the tree of life”. It includes input data and intermediate output files used in our HGT-chimera detection pipeline, as well as in the downstream bioinformatic characterization of these genes. The repository contains FASTA files of protein sequences, clustering results, phylogenetic trees, and tabular summaries of inferred HGT-chimeras, along with downstream analyses describing sequence molecular evolution (dN/dS), phylogenetic origin, gene expression, and domain architecture. Files are organized to correspond with steps in the associated GitHub pipeline, beginning with input clustering d..., , # Data from: Evolutionary innovation through fusion of sequences from across the tree of life Dataset DOI: [10.5061/dryad.t1g1jwtdz](10.5061/dryad.t1g1jwtdz) ## Description of the data and file structure Full details of data processing and analysis are described in the accompanying manuscript and [GitHub repository](https://github.com/rishabhrajkapoor/Arthropod-HGT-chimeras-2025/tree/main). #### Files and variables **mmseq_cluster_representatives_with_missing.fasta** FASTA file of 610,359 proteins as input to the HGT-chimera detection pipeline. Obtained via MMseqs2 clustering of proteins from 319 RefSeq arthropod genome annotations supplemented with 11 proteins from the same annotations that were obtained in a previous pilot iteration of this pipeline. FASTA headers have been set as  \"genome accession;protein accession\". **round1_diamond_output.tar.gz** Tabular output of DIAMOND BLASTp search of mmseq_cluster_representatives_with_missing.fasta vs NR, with standard DIAMOND BLASTp...,
创建时间:
2025-10-28
二维码
社区交流群
二维码
科研交流群
商业服务