five

training data for SoleNNoID protein annotation software

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14927496
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset represents an edited and upgraded version of the REPETITA dataset which was used to train and test the SOLeNNoID convolutional neural network for solenoid residue classification. The original dataset comprised PDB files with full or partial protein structures for solenoid and non-solenoid proteins, as well as residue spans (e.g. residue 224-280) for solenoid regions. In this version of the dataset, we include the original PDB files, distance matrices derived from these PDB files, which account for the missing residues in the PDB files, as well as revised labels by mapping and editing the original residue spans to produce a per-residue label file to match each structure/distance matrix. Additionally, extra beta-solenoid entries were manually added to the dataset. Finally, a new test dataset was curated from reviewed solenoid entries in the RepeatsDB database. The dataset is split into training_validation_dataset and test_dataset directories. The training_validation_dataset directory comprises the original REPETITA dataset, split into alpha-, alpha/beta-, beta-, and non-solenoid directories. Each of these directories contains subdirectories with distance matrices, labels, and PDB structures. In addition, the beta_additional directory contains subdirectories with the distance matrices, labels and PDB structures of further manually added beta-solenoid entries. The test_dataset directory comprises directories with non-solenoid and solenoid structures in mmCIF format, as well as a directory with the ground truth labels, and labels predicted by the TAPO, RepeatsDB-Lite and PRIGSA2 methods. References and links: REPETITA: Luca Marsella, Francesco Sirocco, Antonio Trovato, Flavio Seno, Silvio C.E. Tosatto, REPETITA: detection and discrimination of the periodicity of protein solenoid repeats by discrete Fourier transform, Bioinformatics, Volume 25, Issue 12, June 2009, Pages i289–i295, https://doi.org/10.1093/bioinformatics/btp232 RAPHAEL: Ian Walsh, Francesco G. Sirocco, Giovanni Minervini, Tomás Di Domenico, Carlo Ferrari, Silvio C. E. Tosatto, RAPHAEL: recognition, periodicity and insertion assignment of solenoid protein structures, Bioinformatics, Volume 28, Issue 24, December 2012, Pages 3257–3264, https://doi.org/10.1093/bioinformatics/bts550 SOLeNNoID: Nikov, Georgi and Pretorius, Daniella and Murray, James W., SOLeNNoID: A Deep Learning Pipeline For Solenoid Residue Detection in Protein Structures, bioRxiv, 2024 REPETITA/RAPHAEL dataset link: http://old.protein.bio.unipd.it/raphael/precompiled.html
创建时间:
2025-02-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作