five

The data for: LCR in fungi display functional groups and are depleted in positively charged amino-acids

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/12805821
下载链接
链接失效反馈
官方服务:
资源简介:
Abstract The dataset consists of a TAB formatted table (Main_Dataset.tsv), that integrates information about protein domains (pfam_scan, GO terms), low complexity regions (SEG), signal peptides (SignalP), and transmembrane elements (TMHMM) for each of the analysed proteins within 183 fungal proteomes. The Main Dataset has been designed to ease further searches with Linux bash commands, for e.g. sorting and subsetting by the aforementioned traits. This resource can be used by users interested in detailed annotation of particular protein families, sets of organisms, low complexity regions, types of proteins (for instance transmembrane proteins). Thanks to its simple and clear format, it may also be easily enriched with additional data. Data types: Main_Dataset.tsv is a TSV table with protein annotations Estimate of dataset size:  245MBReadme file: Main_Dataset.tsv is a TSV table with the following columns: Assembly ID from NCBI Protein ID (NCBI accession) Protein length Presence of protein domains; Boolean symbolic localization of protein domains; 10 bins scaled to sum up to total protein length number of transmembrane elements predicted with TMHMM total length of transmembrane elements Symbolic localization of transmembrane elements; 10 bins scaled to sum up to total protein length Presence of signal peptide; Boolean Total number of LCR Total length of LCR Symbolic localization of LCR; 3 bins: N-termini (0-0.25 of protein length), middle (0.25-0.75 of protein length), and C-term (0.75-1 protein length) Symbolic localization of LCR; 10 bins scaled to sum up to total protein length LCR sequences in the N-terminal part of protein, separated by a comma LCR sequences in the middle part of protein, separated by a comma LCR sequences in the  C-terminal part of the protein, separated by a comma Pfam domains overlapping with LCRs (>80% of LCR length) Pfam domains in protein (ordered by domain start) GO terms based on Pfam domains obtained by mapping on pfam2go, separated with the pipe symbol '|'   Acknowledgements This work was supported by National Science Centre grants (#2021/41/B/NZ2/02426 to AM, #2019/35/D/NZ2/03411 to K.S).
创建时间:
2024-11-14
二维码
社区交流群
二维码
科研交流群
商业服务