Empirical analysis of eukaryotic ER signal peptides
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://data.mendeley.com/datasets/p65tkrr89v
下载链接
链接失效反馈官方服务:
资源简介:
Content:
- An excel file containing sequences from all experimentally verified eukaryotic signal peptides available on UniProtKB in May 2021, (as well as all human signal peptide sequences >60 amino acids and selected cases from the literature), and probabilities for their respective n-, h-, and c-regions.
- Signal peptide sequences were automatically extracted, and subjected to region probability prediction by the Hidden Markov models algorithm in SignalP 3.0. Additionally, all human entries were subjected to manual curation, whereby adjacent hydrophobic stretches were added to the h-region.
- The data contains the following information content for 1,492 entries: UniProtKb identifier, review status, protein name, organism, taxonomic lineage, full protein sequence, experimental signal peptide sequence, experimental signal peptide length, predicted n-region sequence and length, predicted h-region sequence and length, predicted c-region sequence and length, experimental cleavage site, predicted cleavage site, difference between predicted and experimental cleavage site, prediction probability, author comment. Also contains the raw data pulled from UniProtKb and the raw output from SignalP 3.0.
- The data has been separated into subsets with different cleavage site predictions and probabilities as well as different evolutionary lineages (found in different tabs of the excel file). The following tabs can be found: Summary; Cleavage site identical to experimental; Cleavage site different from experimental; Low probability; Humans (identical cleavage site); Humans (manually curated); Very long SPs; Vertebrates (identical cleavage site); Protostomes (identical cleavage site); Plants (identical cleavage site); Fungi(identical cleavage site). For each subset, the average and median length, standard deviation, and minimal/maximal length for each region are reported.
Implications of the data:
- The data shows that the length of eukaryotic h-regions is, by average and mean, 11 amino acids, with a maximum length of 14 amino acids for high-probability predictions (>0.5). This is considerably shorter than the hydrophobic segments of non-cleaved transmembrane helices.
- Further, the data show that the bulk of length variation in eukaryotic signal peptides stems from the h-region.
- A substantial subset (19%) of the data show a different predicted and reported experimental cleavage site, for unclear reasons.
创建时间:
2021-08-05



