five

TemStaPro Datasets

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/7743637
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains protein sequences used to train, validate, and test binary classifiers that form TemStaPro program, which is applied for protein thermostability prediction with respect to nine temperature thresholds from 40 to 80 degrees Celsius using a step of five degrees. The data is given in files of FASTA format. Each protein sequence has a header made of three values separated by vertical bar symbols: organism's, to which the protein belongs, UniParc taxonomy identifier; UniProtKB/TrEMBL identifier of the protein sequence; organism's growth temperature taken from the dataset of growth temperatures of over 21 thousand organisms (Engqvist, 2018). TemStaPro-Major-30 set is composed of 12 files: one training one validation one imbalanced testing nine balanced samples of 2000 sequences from each of the balanced testing set TemStaPro-Minor-30 set is composed of cross-validation and testing files all balanced for 65 degrees Celsius temperature threshold. SupplementaryFileC2EPsPredictions.tsv file contains thermostability predictions using the default mode of TemStaPro program to check the thermostability of different C2EP groups.The detailed description is given in the revised version of the corresponding paper (https://doi.org/10.1093/bioinformatics/btae157). If you use the data from this dataset, please cite both the paper and the DOI of the dataset.
创建时间:
2024-04-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作