Protein haplotype sequences obtained by ProHap from the Haplotype Reference Consortium Release 1.1 dataset
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/12671301
下载链接
链接失效反馈官方服务:
资源简介:
Database of protein sequences obtained using ProHap (https://github.com/ProGenNo/ProHap) on the data set of phased genotypes published by the Haplotype Reference Consortium, Release 1.1 (https://ega-archive.org/datasets/EGAD00001002729). We used Ensembl v.110 for the mapping of coordinates between genes, exons, and transcripts.
Release 1.1 of the HRC is provided aligned with the GRCh37 reference genome. We have performed a liftover to the GRCh38 reference using GeneBe (https://genebe.net/tools/liftover). Variants for which the reported alternative allele is considered as reference in GRCh38 were removed. A threshold of 1% minor allele frequency was applied to filter the remaining variants. After translation, a frequency threshold of 0.5% was applied to filter the resulting unique non-canonical sequences. The complete configuration file for the ProHap run is attached to this repository.
This dataset contains one compressed directory, contains the following files:
F1: The concatenated fasta file ready to be used with search engines, contains the following:
Protein haplotype sequences obtained by ProHap
Reference proteome as per Ensembl v. 110
Contaminant sequences from the cRAP project (https://www.thegpm.org/crap/)
The file is provided in two formats - full and simplified. The simplified fasta contains only the artificial protein identifier and the matching gene name, and is optimised for compatibility with a wide range of tools. For annotation of peptides using the PeptideAnnotator, please provide the header (F1.2) in addition to the fasta file.
F2: Additional information about the haplotype sequences, to be used for mapping identified peptides to the original haplotypes
F3: Translations of haplotype cDNA sequences, before merging with the reference proteome
For further description of the files, please refer to https://github.com/ProGenNo/ProHap/wiki/Output-files.
For the usage of these databases with search engines, and downstream anaylsis of identified peptides, please refer to the project's wiki page: https://github.com/ProGenNo/ProHap/wiki/Using-the-database-for-proteomic-searches.
When using these databases in your publication, please cite: Vašíček, J., Kuznetsova, K.G., Skiadopoulou, D. et al. ProHap enables human proteomic database generation accounting for population diversity. Nat Methods (2024). https://doi.org/10.1038/s41592-024-02506-0
创建时间:
2024-12-11



