CleverThis/uniprotkb_obsolete_entries_10000000-v1
收藏Hugging Face2026-01-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/CleverThis/uniprotkb_obsolete_entries_10000000-v1
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- text-generation
- feature-extraction
language:
- en
tags:
- rdf
- knowledge-graph
- semantic-web
- triples
size_categories:
- 1K<n<10K
---
# uniprotkb_obsolete_entries_10000000
## Dataset Description
Comprehensive protein knowledgebase with functional annotations
**Original Source:** https://ftp.uniprot.org/pub/databases/uniprot/current_release/rdf/uniprotkb_obsolete_entries_10000000.rdf.xz
### Dataset Summary
This dataset contains RDF triples from uniprotkb_obsolete_entries_10000000 converted to
HuggingFace dataset format for easy use in machine learning pipelines.
- **Format:** Originally rdf, converted to HuggingFace
Dataset
- **Size:** 0.392 GB (extracted)
- **Entities:** ~90M protein entries
- **Triples:** ~3.4B
- **Original License:** CC BY 4.0
### Recommended Use
Protein research, molecular biology, functional genomics
### Notes
High quality with manual curation for Swiss-Prot entries. Updated every 8 weeks.
## RDF Format
This dataset uses a standard lossless format for representing RDF triples.
Each triple is a row with 6 fields:
- `subject`: Subject URI or blank node
- `predicate`: Predicate URI
- `object`: Object value (URI, literal, or blank node)
- `object_type`: Type of object (`uri`, `literal`, or `blank_node`)
- `object_datatype`: XSD datatype URI (for typed literals)
- `object_language`: Language tag (for language-tagged literals)
### Loading the Dataset
```python
from datasets import load_dataset
dataset = load_dataset("uniprotkb_obsolete_entries_10000000")
for row in dataset["train"]:
print(f"{row['subject']} {row['predicate']} {row['object']}")
```
## Citation
If you use this dataset, please cite the original source:
**Dataset:** uniprotkb_obsolete_entries_10000000
**URL:** https://ftp.uniprot.org/pub/databases/uniprot/current_release/rdf/uniprotkb_obsolete_entries_10000000.rdf.xz
**License:** CC BY 4.0
## Conversion Details
- **Converted using:** [RDF to HuggingFace Incremental Converter](https://github.com/CleverThis/cleverernie)
- **Conversion date:** 2026-01-21
- **Format version:** 1.0
---
This dataset is part of the CleverThis knowledge graph collection.
提供机构:
CleverThis



