Proboscidean Palaeoproteomic Reference Dataset

NIAID Data Ecosystem2026-05-01 收录

下载链接：

https://zenodo.org/record/7848787

下载链接

链接失效反馈

官方服务：

资源简介：

This entry contains the 'Proboscidean Palaeoproteomic Reference Dataset'. We used PaleoProPhyler ( https://github.com/johnpatramanis/Proteomic_Pipeline ) to generate a palaeoproteomic reference dataset of protein sequences from ancient and present-day Proboscidae. Using the first two modules of PaleoProPhyler, we translated more than 35 publicly available whole genomes from extant and extinct species. Details on the processing of the sequences can be found below. Which individuals / species are included? The full list of individuals, the original fastq repository location and the species included in the dataset are contained within the tab seperated file 'METATADATA.txt', that also contains headers. Most individuals of the dataset belong to one of these 3 species: Loxodonta africana, Elephas maximus, Mammuthus primigenius. Which Proteins are included? We compiled a small initial list of 262 proteins that had been indentified in either teeth, bone or items made out of ivory. For each protein, both the canonical and all alternative protein coding isoforms (based on the Loxodonta africana reference proteome of Ensembl) were translated, leading to more than 350 unique protein sequences for each individual in the dataset. The protein list is available in the file 'proteins.txt' How were the proteins translated/generated? All genetic data were downloaded from ENA (https://www.ebi.ac.uk/ena/browser/home) as fastq files and mapped onto LoxAfr3, which is the latest annotated African elephant genome in Ensembl. The scripts used for the mapping are available here: https://github.com/johnpatramanis/Mapping_Scripts . We used the resulting bam files as input for PaleoProPhyler's module 1 & 2 , using LoxAfr3 as the reference proteome. Other files included in the zip folder: - ALL_PROT_REFERENCE.fa contains all of the sequences generated as part of the Proboscidean Palaeoproteomic Reference Dataset described above - PER_PROTEIN is a folder containing one fasta file for each protein within the Proboscidean Palaeoproteomic Reference Dataset, each protein fasta file has the sequences of all individuals for that particular protein - PER_SAMPLE is a folder containing one fasta file for each sample/individual within the Proboscidean Palaeoproteomic Reference Dataset, each sample fasta file has the sequences of all proteins for that particular sample.

创建时间：

2023-04-27

5,000+

优质数据集

54 个

任务类型

进入经典数据集