BioMedBigDataCenter/ben-entities
收藏Hugging Face2026-04-18 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/BioMedBigDataCenter/ben-entities
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: pubmed
data_files:
- split: train
path: data/pubmed/part-*.jsonl.gz
- config_name: pmc
data_files:
- split: train
path: data/pmc/part-*.jsonl.gz
- config_name: uspto
data_files:
- split: train
path: data/uspto/part-*.jsonl.gz
- config_name: clinical_trial
data_files:
- split: train
path: data/clinical_trial/part-*.jsonl.gz
---
# BEN Entities
Full BEN entity extraction results exported from MongoDB as Hub-native
`jsonl.gz` shards.
Each row contains only `document_id` and `entities`. Scores are filtered with
threshold `0.6` and rounded to two decimals.
## Configs
- `pubmed` from Mongo collection `pubmed_ncbi`
- `pmc` from Mongo collection `pmc_xml`
- `uspto` from Mongo collection `patent_uspto`
- `clinical_trial` from Mongo collection `clinical_trial_gov`
## Usage
```python
from datasets import load_dataset
ds = load_dataset("BioMedBigDataCenter/ben-entities", name="pubmed", split="train")
print(len(ds), ds[0]["document_id"])
```
```python
from datasets import load_dataset
ds = load_dataset("BioMedBigDataCenter/ben-entities", name="pmc", split="train")
print(len(ds), ds[0]["document_id"])
```
```python
from datasets import load_dataset
ds = load_dataset("BioMedBigDataCenter/ben-entities", name="uspto", split="train")
print(len(ds), ds[0]["document_id"])
```
```python
from datasets import load_dataset
ds = load_dataset("BioMedBigDataCenter/ben-entities", name="clinical_trial", split="train")
print(len(ds), ds[0]["document_id"])
```
提供机构:
BioMedBigDataCenter



