Kafoo/therascribe-gold-1M
收藏Hugging Face2025-11-30 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Kafoo/therascribe-gold-1M
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- image-to-text
- visual-question-answering
language:
- en
tags:
- medical
- biomedical
- radiology
- pathology
size_categories:
- 1M<n<10M
---
# TheraScribe Gold 1M Dataset
Research-backed medical image-caption dataset for LLaVA-Med++ fine-tuning.
## Dataset Details
- **Samples**: 1,000,000 image-caption pairs
- **Format**: TOON (compact key=value)
- **Quality**: 9.5/10 gold standard
- **Sources**: Biomedica, Path-VQA, PMC-VQA, PMC-OA
## Usage
```python
# Download the dataset
wget https://huggingface.co/datasets/kafoo/therascribe-gold-1M/resolve/main/therascribe_gold_1M.txt
# Parse TOON format
with open('therascribe_gold_1M.txt') as f:
for line in f:
data = dict(item.split('=', 1) for item in line.strip().split(' ') if '=' in item)
print(data)
```
## License
CC-BY-4.0
提供机构:
Kafoo



