Protein haplotype sequences obtained by ProHap from the Human Pangenome Reference Consotruim dataset
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/12686818
下载链接
链接失效反馈官方服务:
资源简介:
Database of protein sequences obtained using ProHap (https://github.com/ProGenNo/ProHap) on the data set of phased genotypes published by the Human Pangenome Reference Consotruim (HPRC), first release (https://github.com/human-pangenomics/hpp_pangenome_resources), 44 samples. We used Ensembl v.110 for the mapping of coordinates between genes, exons, and transcripts.
This repository contains one database created using all 43 samples of the HPRC release (the haplotypes of the sample NA21309 did not encode any non-canonical sequences), and then a database for each of the 43 samples separately. No filtering on allele frequency or haplotype frequency was applied in any of the databases. The complete configuration file for the ProHap run is attached to this repository.
There is one compressed directory for each of the databases, containing the following files:
F1: The concatenated fasta file ready to be used with search engines, contains the following:
Protein haplotype sequences obtained by ProHap
Reference proteome as per Ensembl v. 110
Contaminant sequences from the cRAP project (https://www.thegpm.org/crap/)
For this dataset, only the simplified format is provided. The simplified fasta contains only the artificial protein identifier and the matching gene name, and is optimised for compatibility with a wide range of tools. For annotation of peptides using the PeptideAnnotator, please provide the header (F1.2) in addition to the fasta file.
F2: Additional information about the haplotype sequences, to be used for mapping identified peptides to the original haplotypes
F3: Translations of haplotype cDNA sequences, before merging with the reference proteome
For further description of the files, please refer to https://github.com/ProGenNo/ProHap/wiki/Output-files.
For the usage of these databases with search engines, and downstream anaylsis of identified peptides, please refer to the project's wiki page: https://github.com/ProGenNo/ProHap/wiki/Using-the-database-for-proteomic-searches.
When using these databases in your publication, please cite: Vašíček, J., Kuznetsova, K.G., Skiadopoulou, D. et al. ProHap enables human proteomic database generation accounting for population diversity. Nat Methods (2024). https://doi.org/10.1038/s41592-024-02506-0
创建时间:
2024-12-11



