introvoyz041/synthvision-seeds
收藏Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/introvoyz041/synthvision-seeds
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- visual-question-answering
tags:
- medical
- synthvision
- openmed
size_categories:
- 100K<n<1M
---
# synthvision-seeds

Seed records from 4 open medical image datasets
**Records**: 119,137
## About
Seed dataset for the [SynthVision pipeline](https://huggingface.co/blog/OpenMed/synthvision). Contains 119,137 records aggregated from 4 open medical image datasets:
| Source | Records | Modality |
|--------|---------|----------|
| [eltorio/ROCO-radiology](https://huggingface.co/datasets/eltorio/ROCO-radiology) | 65,393 | Radiology |
| [OpenMed/multicare-images](https://huggingface.co/datasets/OpenMed/multicare-images) | 50,000 | Mixed |
| [flaviagiammarino/path-vqa](https://huggingface.co/datasets/flaviagiammarino/path-vqa) | 3,430 | Pathology |
| [flaviagiammarino/vqa-rad](https://huggingface.co/datasets/flaviagiammarino/vqa-rad) | 314 | Radiology |
Images are deduplicated by SHA-256 hash. Each record contains an image path, source dataset ID, modality, and any available metadata (captions or Q&A pairs).
## Schema
```
id: str # unique record ID
image: str # relative image path
source: str # source dataset name
modality: str # imaging modality
metadata: dict # captions, Q&A pairs, or labels
```
## Loading
```python
from datasets import load_dataset
ds = load_dataset("OpenMed/synthvision-seeds")
```
## Links
- [SynthVision blog post](https://huggingface.co/blog/OpenMed/synthvision)
- [Source code](https://github.com/openmed-labs/synthvision)
- [All SynthVision artifacts](https://huggingface.co/collections/OpenMed/synthvision-69baac655b557943aa1babd3)
- [OpenMed on Hugging Face](https://huggingface.co/OpenMed)
---
许可证:Apache-2.0
任务类别:视觉问答(visual-question-answering)
标签:医疗(medical)、synthvision(SynthVision)、OpenMed(openmed)
样本量范围:100K < n < 1M
---
# SynthVision种子数据集

来自4个开源医学图像数据集的种子样本记录
**样本量**:119,137
## 关于
本数据集为[SynthVision流水线(SynthVision pipeline)](https://huggingface.co/blog/OpenMed/synthvision)的种子数据集,包含从4个开源医学图像数据集聚合得到的119,137条样本记录:
| 来源 | 样本量 | 成像模态(modality) |
|--------|---------|----------|
| [eltorio/ROCO-radiology](https://huggingface.co/datasets/eltorio/ROCO-radiology) | 65,393 | 放射学 |
| [OpenMed/multicare-images](https://huggingface.co/datasets/OpenMed/multicare-images) | 50,000 | 混合模态 |
| [flaviagiammarino/path-vqa](https://huggingface.co/datasets/flaviagiammarino/path-vqa) | 3,430 | 病理学 |
| [flaviagiammarino/vqa-rad](https://huggingface.co/datasets/flaviagiammarino/vqa-rad) | 314 | 放射学 |
所有图像已通过SHA-256哈希值去重。每条样本均包含图像相对路径、来源数据集标识、成像模态以及所有可用元数据(说明文本或问答对)。
## 数据结构
id: str # 唯一样本ID
image: str # 图像相对路径
source: str # 来源数据集名称
modality: str # 成像模态
metadata: dict # 说明文本、问答对或标签
## 加载方式
python
from datasets import load_dataset
ds = load_dataset("OpenMed/synthvision-seeds")
## 相关链接
- [SynthVision博客文章](https://huggingface.co/blog/OpenMed/synthvision)
- [源代码](https://github.com/openmed-labs/synthvision)
- [所有SynthVision相关产物](https://huggingface.co/collections/OpenMed/synthvision-69baac655b557943aa1babd3)
- [Hugging Face平台OpenMed主页](https://huggingface.co/OpenMed)
提供机构:
introvoyz041



