five

Data from: SimPhy: phylogenomic simulation of gene, locus and species trees

收藏
DataONE2015-10-30 更新2024-06-27 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈
官方服务:
资源简介:
We present a fast and flexible software package –SimPhy– for the simulation of multiple gene families evolving under incomplete lineage sorting, gene duplication and loss, horizontal gene transfer –all three potentially leading to species-tree/gene-tree discordance– and gene conversion. SimPhy implements a hierarchical phylogenetic model in which the evolution of species, locus and gene trees is governed by global and local parameters (e.g., genome-wide, species-specific, locus-specific), that can be fixed or be sampled from a priori statistical distributions. SimPhy also incorporates comprehensive models of substitution rate variation among lineages (uncorrelated relaxed clocks) and the capability of simulating partitioned nucleotide, codon and protein multilocus sequence alignments under a plethora of substitution models using the program INDELible. We validate SimPhy’s output using theoretical expectations and other programs, and show that it scales extremely well with complex models and/or large trees, being an order of magnitude faster than the most similar program (DLCoal-Sim). In addition, we demonstrate how SimPhy can be useful to understand interactions among different evolutionary processes, conducting a simulation study to characterize the systematic overestimation of the duplication time when using standard reconciliation methods. SimPhy is available at https://github.com/adamallo/SimPhy, where users can find the source code, pre-compiled executables, a detailed manual and example cases.

我们推出了一款快速灵活的软件包——SimPhy——用于模拟在不完全谱系分选(incomplete lineage sorting)、基因重复与丢失(gene duplication and loss)、水平基因转移(horizontal gene transfer)——这三种过程均可能引发物种树/基因树冲突(species-tree/gene-tree discordance)——以及基因转换(gene conversion)作用下演化的多基因家族。SimPhy采用层级系统发育模型(hierarchical phylogenetic model),其中物种树、位点树与基因树的演化由全局及局部参数(例如全基因组水平、物种特异性、位点特异性参数)调控,这些参数既可固定取值,也可从先验统计分布中采样获取。SimPhy还集成了针对谱系间替换速率变异(substitution rate variation among lineages)的完善模型(无关联松弛分子钟(uncorrelated relaxed clocks)),并可借助INDELible软件,在多种替换模型(substitution models)下模拟分区核苷酸、密码子及蛋白质多位点序列比对(multilocus sequence alignments)文件。我们通过理论预期值与其他软件对SimPhy的输出结果进行了验证,结果表明其在复杂模型及/或大型树结构下的扩展性极佳,运行速度比同类最相似软件(DLCoal-Sim)快一个数量级。此外,我们通过开展一项模拟研究,刻画了使用标准整合分析方法时重复事件时间被系统性高估的现象,以此展示了SimPhy在解析不同演化过程间相互作用方面的应用价值。用户可通过https://github.com/adamallo/SimPhy获取SimPhy,该页面提供源代码、预编译可执行文件、详细使用手册及示例案例。
创建时间:
2015-10-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作