PDBEurope/protein_chain_conformational_states
收藏Hugging Face2023-10-24 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/PDBEurope/protein_chain_conformational_states
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个手动整理的蛋白质链构象状态数据集,包含了开放-关闭单体对以及中间状态,这些状态由条目的作者定义。数据集通过查询PDBe数据库,筛选出在UniProt段中具有100%序列同一性的开放和关闭构象的PDBe条目,并进行了手动整理。整理过程包括清理不相关的条目、手动调查每个PDBe条目的对应出版物、添加遗漏的条目、记录中间构象以及移除以多聚体形式解析的条目。数据集还通过PDBe-KB的现有聚类算法进行了交叉引用,以评估当前构象聚类的成功。
提供机构:
PDBEurope
原始信息汇总
数据集概述
数据集名称
Curated dataset of protein chain conformational states
数据集标签
- Structural biology
- Bioinformatics
- Machine learning
- Conformation
- Conformational state
- Monomeric
- Training data
- Benchmark
- Manually curated
数据集大小
10K<n<100K
任务类别
- feature-extraction
语言
- en
许可证
cc-by-4.0
数据集文件
benchmarking_monomeric_open_closed_conformers.csv
数据集列描述
UNP_ACC| UniProt accession codeUNP_START| Start of UniProt sequence for given PDBe entriesUNP_END| End of UniProt sequence for given PDBe entriesPDBe_ID| Protein Data Bank codeCHAIN_ID| Author declared chain ID (char)label_asym_id| Programmatically assigned chain ID (char)CONFORMER_ID| Unique code for PDBe entries with distinct conformation, corresponding to a given UniProt accessionCONFORMER_DESCR| Short description of conformation, based on depositors assessment of the protein/conformationLIT_CONFIRMED| True/false value based on whether a publication (scientific literature) was available for manually curating clusters. NB: Clusters with 0 in this field should be used with caution.ALT_CONFORMER_ID| Where the publication for a structure is currently outstanding, an executive decision on the conformation classification is made. Where the literature is not explicit on the features of a given conformation, the second most suitableCONFORMER_IDis provided in this column. Blank cells have no other likely conformation assignmnt and are therefore the same as inCONFORMER_ID.ALT_CONFORMER_DESCR| Description for conformation in alternative conformation ID.
数据集整理过程
- 通过PDBe的Oracle数据库查询,找到在
TITLE字段中同时具有open和closed构象的PDBe条目,且序列完全相同。 - 清理结果,移除
TITLE字段中包含open或close但不指代构象的条目。 - 手动检查剩余条目,并根据相应出版物进行整理:
- 添加作者提交的、在原始搜索中因缺少open或close字段而遗漏的条目。
- 记录作者报告的中间构象。
- 移除以单体形式提交但解决为多聚体复合物的条目。
- 将PDBe条目按作者声明的构象进行聚类,并与PDBe-KB的现有聚类算法进行交叉验证。
整理流程图

数据集总结图




