Phylogenetic signal in primate tooth enamel proteins and its relevance for paleoproteomics
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10637109
下载链接
链接失效反馈官方服务:
资源简介:
ABSTRACT
Ancient tooth enamel, and to some extent dentin and bone, contain characteristic peptides that persist for long periods of time. In particular, peptides from the tooth enamel proteome (enamelome), have been used to reconstruct the phylogenetic relationships of fossil specimens and to estimate divergence times. However, the enamelome is based on only about ten genes, whose protein products are cleaved in vivo and undergo further fragmentation post mortem. Moreover, some of the enamelome genes are paralogous or may coevolve, and so do not act as independent loci. This raises the question as to whether the enamelome provides enough information for reliable phylogenetic inference. Here, we address these considerations on a selection of enamel-associated proteins that has been predicted from high coverage genomic data from 233 primate species. From this predicted proteome, we created multiple sequence alignments (MSAs) for each protein and estimated the evolutionary rate for each site in the MSA. We then examined sites that overlap with the parts of the protein sequences that are typically isolated from fossils. Based on this, we simulated and aligned ancient data with different degrees of sequence fragmentation and built phylogenetic trees from these fragmented protein alignments. For a fragmentation stage that is similar to previously published fossil samples (1-2 million years ago (Ma), temperate to tropical zones), the phylogenetic placements of most nodes at family level are consistent with a previously published reference species tree that is based on the same genomic data as the protein sequence predictions. With higher levels of fragmentation, the placement of the family-level nodes contradicts the reference tree and statistical confidence decreases. We found that the composition of the proteome influences the phylogenetic placement of tarsiers (infraorder Tarsiiformes). In some cases, it leads to tarsiers being incorrectly placed as more closely related to strepsirhines than to anthropoids. For all proteins, we observed a clear trend that, in contrast to conserved sites, variable sites contribute more to the recovery of phylogenies that resemble the reference species tree. Future experimental and bioinformatic efforts for peptide sequencing may implement this information to recover these more informative parts of the enamelome. Regarding our results, for the inference of molecular phylogenies based on paleoproteomic data, we recommend characterizing the evolution of the proteomes from the closest extant relatives to maximize the reliability of phylogenetic inference.
DATASET GENERATION
Tooth enamel proteins have been predicted from public high coverage sequence data of 722 individuals of 233 primate species. In addition, protein sequence data of ancient equids and deinotheriids was generated at the facilities of the Pompeu Fabra University and the Center for Genomic Regulation in Barcelona, Spain. The sequence data of all 14 enamel proteins has been aligned and quality-checked. From these alignments, Rate4Site scores (Pupko et al. 2002) and Shannon entropy were calculated. Moreover, we created phylogenetic trees on different sets of these proteins.
创建时间:
2024-02-27



