ProteinKG25
收藏arXiv2025-09-30 收录
下载链接:
https://zjunlp.github.io/project/ProteinKG25/
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个大规模的知识图谱数据集,它融合了基因本体论和公开标注的蛋白质信息,其中包含了与GO(基因本体)术语对齐的描述和蛋白质序列。此外,该数据集还整合了来自Swiss-Prot的基因本体术语和标注,通过知识图谱结构,可以增强蛋白质表示学习。数据集的规模包括4,990,097个三元组,其中包含4,879,951个蛋白质-GO三元组和110,146个GO-GO三元组。其任务涵盖了蛋白质表示学习以及包括蛋白质功能预测和蛋白质-蛋白质相互作用在内的多个下游任务。
This is a large-scale knowledge graph dataset that integrates Gene Ontology and publicly annotated protein information, including descriptions aligned with GO (Gene Ontology) terms and protein sequences. Additionally, this dataset incorporates Gene Ontology terms and annotations from Swiss-Prot, and leverages the knowledge graph structure to enhance protein representation learning. The dataset contains a total of 4,990,097 triples, consisting of 4,879,951 protein-GO triples and 110,146 GO-GO triples. It supports multiple downstream tasks, including protein representation learning, protein function prediction, and protein-protein interaction.



