five

TriZOD Dataset 2024-05-09

收藏
Figshare2024-05-12 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/TriZOD_Dataset_2024-05-09/25792035
下载链接
链接失效反馈
官方服务:
资源简介:
AbstractAccurate quantification of intrinsic disorder, crucial for understanding functional protein dynamics, remains challenging. We introduce TriZOD, an innovative scoring system for protein disorder analysis, utilizing nuclear magnetic resonance (NMR) spectroscopy chemical shifts. Traditional methods provide binary, residue-specific annotations, missing the complex spectrum of protein disorder. TriZOD extends the CheZOD scoring framework with quantitative statistical descriptors, offering a nuanced analysis of intrinsically disordered regions. It calculates per-residue scores from chemical shift data of polypeptides in the Biological Magnetic Resonance Data Bank (BMRB). The CheZOD Z-score is a quantitative metric for how much a set of experimentally determined chemical shifts deviate from random coil chemical shifts. The TriZOD G-scores extend upon them to be independent of the number of available chemical shifts. They are normalized to range between 0 and 1, which is beneficial for interpretation and use in training disorder predictors. Additionally, TriZOD introduces a refined, automated selection of BMRB datasets, including filters for physicochemical properties, keywords, and chemical denaturants. We calculated G-scores for over 15,000 peptides in the BMRB, approximately 10-fold the size of previously published CheZOD datasets.Validation against DisProt annotations demonstrates substantial agreement yet highlights discrepancies, suggesting the need to reevaluate some disorder annotations. TriZOD advances protein disorder prediction by leveraging the full potential of the BMRB database, refining our understanding of disorder, and challenging existing annotations.Dataset DescriptionThis publication consists of four nested datasets of increasing filter stringency: Unfiltered, tolerant, moderate and strict. An overview of the applied filters is given on the project GitHub repository: https://github.com/MarkusHaak/trizod. The .json files contain all entries of the BMRB that are in accordance with the given filter levels. These are not redundancy reduced and also contain the test set entries and are therefore not intended for direct use as training sets in machine learning applications. Instead, for this purpose, please use only those entries with IDs found in the [filter_level]_rest_set.fasta files and extract the corresponding information such as TriZOD G-scores and/or physicochemical properties from the respective .json files. These fasta files contain the cluster representatives of the redundancy reduction procedure which was performed in an iterative fashion such that clusters with members found in all filter levels are shared among them and have the same cluster representatives. If necessary, all other cluster members can be retrieved from the given [filter_level]_rest_clu.tsv files. The file TriZOD_test_set.fasta contains the IDs and sequences of the TriZOD test set. It is intended that the corresponding data is taken from the strict dataset.
创建时间:
2024-05-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作