A Detailed History of Intron-rich Eukaryotic Ancestors Inferred from a Global Survey of 100 Complete Genomes
收藏Figshare2016-01-18 更新2026-04-29 收录
下载链接:
https://figshare.com/articles/dataset/A_Detailed_History_of_Intron_rich_Eukaryotic_Ancestors_Inferred_from_a_Global_Survey_of_100_Complete_Genomes/133355
下载链接
链接失效反馈官方服务:
资源简介:
Protein-coding genes in eukaryotes are interrupted by introns, but intron densities widely differ between eukaryotic lineages. Vertebrates, some invertebrates and green plants have intron-rich genes, with 6–7 introns per kilobase of coding sequence, whereas most of the other eukaryotes have intron-poor genes. We reconstructed the history of intron gain and loss using a probabilistic Markov model (Markov Chain Monte Carlo, MCMC) on 245 orthologous genes from 99 genomes representing the three of the five supergroups of eukaryotes for which multiple genome sequences are available. Intron-rich ancestors are confidently reconstructed for each major group, with 53 to 74% of the human intron density inferred with 95% confidence for the Last Eukaryotic Common Ancestor (LECA). The results of the MCMC reconstruction are compared with the reconstructions obtained using Maximum Likelihood (ML) and Dollo parsimony methods. An excellent agreement between the MCMC and ML inferences is demonstrated whereas Dollo parsimony introduces a noticeable bias in the estimations, typically yielding lower ancestral intron densities than MCMC and ML. Evolution of eukaryotic genes was dominated by intron loss, with substantial gain only at the bases of several major branches including plants and animals. The highest intron density, 120 to 130% of the human value, is inferred for the last common ancestor of animals. The reconstruction shows that the entire line of descent from LECA to mammals was intron-rich, a state conducive to the evolution of alternative splicing.
真核生物的蛋白质编码基因均被内含子所打断,但不同真核生物类群间的内含子密度差异显著。脊椎动物、部分无脊椎动物以及绿色植物拥有内含子富集型基因,每千碱基编码序列中内含子数量可达6至7个;而绝大多数其他真核生物则为内含子贫乏型基因。本研究针对已获得多套基因组序列的5个真核超类群中的3个,利用245个来自99个基因组的直系同源基因,通过概率马尔可夫模型(Markov Chain Monte Carlo, MCMC)重构了内含子获得与丢失的演化历史。各主要类群的祖先均被可靠重构为内含子富集型;其中最后真核共同祖先(Last Eukaryotic Common Ancestor, LECA)的内含子密度经95%置信度推断可达人类内含子密度的53%至74%。本研究将MCMC重构结果与采用最大似然法(Maximum Likelihood, ML)以及多洛简约法(Dollo parsimony)得到的重构结果进行了对比。研究结果显示,MCMC与ML的推断结果具有极佳的一致性;而多洛简约法则会在估算中引入显著偏倚,其推断出的祖先内含子密度通常低于MCMC与ML的结果。真核生物基因的演化以内含子丢失为主导,仅在包括植物与动物在内的数个主要演化支的基部发生了大量的内含子获得事件。动物的最近共同祖先的内含子密度被推断为所有类群中最高,可达人类内含子密度的120%至130%。本次重构结果显示,从最后真核共同祖先(LECA)到哺乳类的整个演化谱系均为内含子富集型,这一状态有利于可变剪接的演化。
创建时间:
2016-01-18



