Data from: Gene-tree reconciliation with MUL-trees to resolve polyploidy events
收藏Mendeley Data2024-06-25 更新2024-06-27 收录
下载链接:
https://zenodo.org/records/4931499
下载链接
链接失效反馈官方服务:
资源简介:
Polyploidy can have a huge impact on the evolution of species, and it is a common occurrence, especially in plants. The two types of polyploids - autopolyploids and allopolyploids - differ in the level of divergence between the genes that are brought together in the new polyploid lineage. Because allopolyploids are formed via hybridization, the homoeologous copies of genes within them are at least as divergent as orthologs in the parental species that came together to form them. This means that common methods for estimating the parental lineages of allopolyploidy events are not accurate, and can lead to incorrect inferences about the number of gene duplications and losses. Here, we have adapted an algorithm for topology-based gene-tree reconciliation to work with multi-labeled trees (MUL-trees). By definition, MUL-trees have some tips with identical labels, which makes them a natural representation of the genomes of polyploids. Using this new reconciliation algorithm we can: accurately place allopolyploidy events on a phylogeny, identify the parental lineages that hybridized to form allopolyploids, distinguish between allo-, auto-, and (in most cases) no polyploidy, and correctly count the number of duplications and losses in a set of gene trees. We validate our method using gene trees simulated with and without polyploidy, and revisit the history of polyploidy in data from the clades including both baker's yeast and bread wheat. Our re-analysis of the yeast data confirms the allopolyploid origin and parental lineages previously identified for this group. The method presented here should find wide use in the growing number of genomes from species with a history of polyploidy.
多倍性(Polyploidy)对物种演化可产生深远影响,且作为一种极为常见的生物学现象,在植物类群中尤为普遍。两类多倍体——同源多倍体(autopolyploids)与异源多倍体(allopolyploids)——在整合入新多倍体谱系的基因间分化水平上存在显著差异。由于异源多倍体由杂交事件形成,其内部的基因部分同源拷贝的分化程度,至少与参与杂交形成该多倍体的亲本物种间的直系同源基因(orthologs)相当。这意味着,现有用于推断异源多倍化事件亲本谱系的常规方法并不准确,可能会对基因复制与丢失的数量得出错误推论。本研究将一款基于拓扑结构的基因树协调分析(gene-tree reconciliation)算法,适配至多标记树(multi-labeled trees, MUL-trees)场景。根据定义,MUL-trees存在部分末端节点具有相同标签的特征,使其能够自然表征多倍体的基因组。借助这款改进后的协调分析算法,我们可实现以下功能:精准将异源多倍化事件定位于系统发育树(phylogeny)中;识别参与杂交形成异源多倍体的亲本谱系;区分异源多倍体、同源多倍体以及(多数情况下)未发生多倍化的类群;并准确统计一组基因树中的基因复制与丢失事件数量。本研究通过有无多倍化场景下模拟生成的基因树,对所提方法进行了验证;并重新分析了包含酿酒酵母(baker's yeast)与普通小麦(bread wheat)在内的演化支(clades)数据中的多倍化历史。对酵母数据的重新分析,验证了该类群此前被认定的异源多倍体起源与亲本谱系。本研究所提方法,有望在日益增多的具有多倍化演化历史的物种基因组研究中得到广泛应用。
创建时间:
2023-06-28



