The data for: LCR in fungi display functional groups and are depleted in positively charged amino-acids
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/12805821
下载链接
链接失效反馈官方服务:
资源简介:
Abstract
The dataset consists of a TAB formatted table (Main_Dataset.tsv), that integrates information about protein domains (pfam_scan, GO terms), low complexity regions (SEG), signal peptides (SignalP), and transmembrane elements (TMHMM) for each of the analysed proteins within 183 fungal proteomes. The Main Dataset has been designed to ease further searches with Linux bash commands, for e.g. sorting and subsetting by the aforementioned traits. This resource can be used by users interested in detailed annotation of particular protein families, sets of organisms, low complexity regions, types of proteins (for instance transmembrane proteins). Thanks to its simple and clear format, it may also be easily enriched with additional data.
Data types:
Main_Dataset.tsv is a TSV table with protein annotations
Estimate of dataset size:
245MBReadme file:
Main_Dataset.tsv is a TSV table with the following columns:
Assembly ID from NCBI
Protein ID (NCBI accession)
Protein length
Presence of protein domains; Boolean
symbolic localization of protein domains; 10 bins scaled to sum up to total protein length
number of transmembrane elements predicted with TMHMM
total length of transmembrane elements
Symbolic localization of transmembrane elements; 10 bins scaled to sum up to total protein length
Presence of signal peptide; Boolean
Total number of LCR
Total length of LCR
Symbolic localization of LCR; 3 bins: N-termini (0-0.25 of protein length), middle (0.25-0.75 of protein length), and C-term (0.75-1 protein length)
Symbolic localization of LCR; 10 bins scaled to sum up to total protein length
LCR sequences in the N-terminal part of protein, separated by a comma
LCR sequences in the middle part of protein, separated by a comma
LCR sequences in the C-terminal part of the protein, separated by a comma
Pfam domains overlapping with LCRs (>80% of LCR length)
Pfam domains in protein (ordered by domain start)
GO terms based on Pfam domains obtained by mapping on pfam2go, separated with the pipe symbol '|'
Acknowledgements
This work was supported by National Science Centre grants (#2021/41/B/NZ2/02426 to AM, #2019/35/D/NZ2/03411 to K.S).
创建时间:
2024-11-14



