five

VEHoP: A Versatile, Easy-to-use, and Homology-based Phylogenomic pipeline accommodating diverse sequences

收藏
DataCite Commons2025-12-02 更新2024-08-26 收录
下载链接:
https://figshare.com/articles/dataset/VEHoP_A_Versatile_Easy-to-use_and_Homology-based_Phylogenomic_pipeline_accommodating_diverse_sequences/26370955/1
下载链接
链接失效反馈
官方服务:
资源简介:
Phylogenomics has become a prominent method in systematics, conservation biology, and biomedicine, as it can leverage hundreds to thousands of genes derived from genomic or transcriptomic data to infer evolutionary relationships. However, obtaining high-quality genomes and transcriptomes requires samples preserved with high-quality DNA and RNA and demands considerable sequencing costs and lofty bioinformatic efforts (e.g., genome/transcriptome assembly and annotation). Notably, only fragmented DNA reads are accessible in some rare species due to the difficulty in sample collection and preservation, such as those inhabiting the deep sea. To address this issue, we here introduce the VEHoP (Versatile, Easy-to-use Homology-based Phylogenomic) pipeline, designed to infer protein-coding regions from DNA assemblies and generate alignments of orthologous sequences, concatenated matrices, and phylogenetic trees. This pipeline aims to 1) expand taxonomic sampling by accommodating a wide range of input files, including draft genomes, transcriptomes, and well-annotated genomes, and 2) simplify the process of conducting phylogenomic analyses and thus make it more accessible to researchers from diverse backgrounds. We first evaluated the performance of VEHoP using datasets of Ostreida, yielding robust phylogenetic trees with strong bootstrap support. We then applied VEHoP to reconstruct the phylogenetic relationship in the enigmatic deep-sea gastropod order Neomphalida, obtaining a robust phylogenetic backbone for this group. The VEHoP is freely available on GitHub (https://github.com/ylify/VEHoP), whose dependencies can be easily installed using Bioconda.

系统发育基因组学(Phylogenomics)已成为分类学、保护生物学与生物医学领域的主流研究方法,其可借助源自基因组或转录组数据的数百至数千个基因推断物种间的进化关系。然而,获取高质量基因组与转录组需要使用保存完好的高质量DNA和RNA样本,同时需承担不菲的测序成本,并投入大量生物信息学工作,例如基因组/转录组组装与注释。值得注意的是,由于样本采集与保存难度较高,部分稀有物种(如深海栖息物种)仅能获得碎片化的DNA测序读段。为解决上述问题,本研究推出VEHoP(通用型、易用型基于同源性的系统发育基因组学分析流程,Versatile, Easy-to-use Homology-based Phylogenomic),该流程旨在从DNA组装序列中推断蛋白质编码区域,并生成直系同源序列比对、串联矩阵与系统发育树。本流程旨在达成两大目标:1)通过兼容多种输入文件类型(包括草图基因组、转录组与已注释完善的基因组)拓展分类学采样范围;2)简化系统发育基因组学分析流程,使其对多学科背景的研究人员更具可及性。本研究首先以牡蛎目(Ostreida)数据集对VEHoP的性能进行了验证,所得系统发育树具有较高的自展支持率,结果稳健可靠。随后,我们将VEHoP应用于重建神秘的深海腹足纲新脐螺目(Neomphalida)的系统发育关系,为该类群构建了稳健的系统发育主干框架。VEHoP可在GitHub(https://github.com/ylify/VEHoP)免费获取,其依赖项可通过Bioconda轻松安装。
提供机构:
figshare
创建时间:
2024-07-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作