CleverThis/uniprotkb_obsolete_entries_250000000-v1
收藏Hugging Face2025-12-29 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/CleverThis/uniprotkb_obsolete_entries_250000000-v1
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个全面的蛋白质知识库,包含功能注释,已从RDF格式转换为HuggingFace数据集格式,便于在机器学习流程中使用。数据集包含约9000万个蛋白质条目和34亿个三元组,解压后大小为0.392 GB,采用CC BY 4.0许可协议。推荐用于蛋白质研究、分子生物学和功能基因组学。数据集保留了原始RDF知识图谱的所有语义信息,支持RDF和HuggingFace格式之间的完美往返转换。每个RDF三元组用六个字段表示:subject、predicate、object、object_type、object_datatype和object_language。数据集由CleverThis组织维护,每8周更新一次。
Comprehensive protein knowledgebase with functional annotations converted from RDF format to HuggingFace dataset format for easy use in machine learning pipelines. The dataset contains ~90M protein entries and ~3.4B triples, with a size of 0.392 GB (extracted). It is licensed under CC BY 4.0 and recommended for protein research, molecular biology, and functional genomics. The dataset preserves all semantic information from the original RDF knowledge graph, enabling perfect round-trip conversion between RDF and HuggingFace formats. Each RDF triple is represented with 6 fields: subject, predicate, object, object_type, object_datatype, and object_language. The dataset is maintained by the CleverThis organization and updated every 8 weeks.
提供机构:
CleverThis



