five

TopDomain dataset v2.0

收藏
DataCite Commons2021-05-01 更新2024-07-13 收录
下载链接:
https://researchdata.hhu.de/handle/entry/88
下载链接
链接失效反馈
官方服务:
资源简介:
This is the TopDomain dataset v2.0 as described in: "TopDomain: Exhaustive Protein Domain Boundary Meta-Prediction Combining Multi-Source Information and Deep Learning" by Daniel Mulnaes, Pegah Golchin, Filip Koenig, and Holger Gohlke. This dataset contains three folder: dataset : Contains the full dataset and the TopDomain and TopDomainSeq predictions for the dataset training_set : Contains the fasta files of the TopDomain training set test_set : Contains the fasta files of the TopDomain test set Each fasta file has a header with three fields, in the following format: >system_name|domain_type|boundary_list Where: system_name contains the PDB ID and chain ID of the target protein domain_type contains target type, either single-domain or multi-domain boundary_list contains a list of residues annotated as domain boundaries separated by spaces, this field is empty for single-domain proteins as they have no domain boundaries The sequence is the fasta-sequence of the protein each line contains at most 100 residues of the protein sequence No protein in the test set shares more than 20% sequence identity to any protein in the training set.
提供机构:
N/A
创建时间:
2021-05-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作