five

Application of phylogenetic trees in variant analysis, genome evolution and gene functional annotation

收藏
Mendeley Data2024-01-31 更新2024-06-27 收录
下载链接:
https://digitallibrary.usc.edu/asset-management/2A3BF161SY8B
下载链接
链接失效反馈
官方服务:
资源简介:
Phylogenetics study the hierarchical evolutionary relationships among things that are present today. A taxonomic tree, for example, is the phylogenetic tree of species. The leaves are the extant species; the internal nodes are the common ancestors of extant species or other internal nodes. Phylogenetic gene trees depict the relationships among genes. ❧ In this dissertation, I’m using the phylogenetic gene trees from the Dr Thomas and Dr Mi’s lab at University of Southern California. PANTHER stands for Protein Analysis Through Evolutionary Relationships, it classifies proteins into families which are groups of evolutionarily related proteins. Thus, each PANTHER phylogenetic gene tree is a collection of a group of homologs and orthologs. PANTHER phylogenetic gene trees are also forced to follow the same evolutionary pattern of taxonomy. Internal nodes of these gene trees, or the common ancestor of a group of genes are linked to ancestral species in taxonomic tree and thus have a biological meaning, representing ancestral genes and gene events. ❧ There are 3 types of internal nodes, speciation nodes, duplication nodes, and horizontal gene transfer nodes. Speciation is the origination of a new species. In PANTHER, phylogenetic gene trees, it represents an ancestral gene in a corresponding common ancestor, which finally evolved into extant species. From another point of view, genes in extant species are inherited from these ancestral genes which had existed in ancestral species which are parents of extant species. Speciation events is the most common source for a species to get genes. Human genes for example, are inherited from a series of ancestors, from primates, mammals, vertebrates, all the way to Eukaryota: the common ancestor of all eukaryotes and LUCA: the universal least common ancestor. Duplication nodes depict gene duplication events at the specific locations of gene trees. Duplication event is a major mechanism which adds new genetic material to genomes. Duplication node for a specific species are inferred if 2 or more copies of homologous genes are found in this species. The time of duplication is depicted via the species. For example, a duplication node whose children are 2 Human genes depict a duplication event after divergence of Human beings with other primates; a duplication node at Mammalia, indicates it has 2 or more homologous Mammalia genes as descendants. Horizontal gene transfer is the non-vertical transmission process of genetic material. ❧ PANTHER phylogenetic gene trees convey precious knowledge of evolutionary histories for individual gene families. Amazingly, the various species on earth share so much similarities among their genetic material despite their vastly different appearances and behaviors. It is hard to imagine the similarities between a Human being and a bacterium, but it is obvious at the genetic level. Genes are indeed the building blocks of life. Evolution is like the necklace that links biological knowledge pearls together. With the fast development of biotechnologies, we are entering an epic era of deciphering the secret of life. ❧ In this dissertation, I utilize the PANTHER phylogenetic gene trees to solve several scientific problems. ❧ In the first chapter, I developed a software package “PANTHER-PSEP” which uses a new metric called “evolutionary preservation” for gene variant analysis. Ancestral genes sequences are reconstructed from extant gene sequences alignments in phylogenetic gene trees. The site (locus) of a variant can be traced through its ancestral sequences, to measure how long it has remained the same, giving an estimate of the magnitude of negative selection. The magnitude is displayed as “locus preservation time”, and is used to differentiate benign or deleterious variants. ❧ In the second chapter, I constructed and analyzed over 10,000 gene family trees to reconstruct the gene content of ancestral genomes at an unprecedented scale, covering hundreds of genomes across all domains of life. I find that the rate of gene gain varies widely among branches of the species tree, and find that some periods of rapid gene duplication are associated with known whole-genome duplication events. ❧ In the third chapter, I tried to answer a hypothesis that has been hotly debated over the last 20 years: “the hypothesis of 2 rounds of whole genome duplications at early vertebrates”. I have analyzed the hypothesis from a combined perspective of phylogenetics and genomic homology, and provide convincing evidence supporting the 2R hypothesis by looking at the homology evidences of ancestral duplication events that happen at the specific periods of early vertebrates. ❧ In the last three chapters, I deal with problems involved in how gene functions can be inferred from the known functions of related genes, with the help of phylogenetics trees. Given the evolutionary histories of genes in the PANTHER gene trees, expert biologists of the PAINT (Phylogenetic Annotation and Inference Tool) project try to link the gene functions in the form of Gene Ontology annotations to evolutionarily related genes in each tree, and build the evolutionary model of gene functions upon the evolutionary model of genes. The function evolutionary model attempts to solve the scientific problem of emergence, inheritance, change of gene functions that are accompanied with gain or loss of gene copies and changes of gene sequences. The functions predicted by the model can be applied to unstudied genes of species that have much less attention than human, mouse and several other model organisms. Specifically, I developed a computational method for performing phylogenetic annotation. One of the cornerstones of this method involved generating a database of “taxon constraints” for Gene Ontology terms, which are a set of controlled vocabularies designed to universally cover the functions of all species genes. Finally, I developed a method, and associated parameters, for building a probabilistic model of gene function evolution and its application to predicting the functions of uncharacterized genes.
创建时间:
2024-01-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作