OpenMed/synthvision-validated-qwen-by-kimi
收藏Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/OpenMed/synthvision-validated-qwen-by-kimi
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- visual-question-answering
tags:
- medical
- synthvision
- openmed
size_categories:
- 10K<n<100K
---
# synthvision-validated-qwen-by-kimi

Qwen 3.5 annotations validated by Kimi K2.5 (93.1% pass rate)
**Records**: 55,359
## About
Cross-validated subset from the [SynthVision pipeline](https://huggingface.co/blog/OpenMed/synthvision). Kimi K2.5 reviewed all 59,476 Qwen 3.5 annotations and confirmed 55,359 as consistent with the source images (93.1% pass rate).
Validation criteria: `consistent == true` AND `confidence >= 0.7`. Records that failed validation were removed — primarily cases where the annotator hallucinated findings not visible in the image.
## Schema
```
id: str # unique record ID
image: str # relative image path
conversations: list[dict] # multi-turn ShareGPT format
report: str # clinical narrative
structured_findings: dict # finding_name → value
validation: dict # {consistent, confidence, reason}
quality_score: float # composite quality score
```
## Loading
```python
from datasets import load_dataset
ds = load_dataset("OpenMed/synthvision-validated-qwen-by-kimi")
```
## Links
- [SynthVision blog post](https://huggingface.co/blog/OpenMed/synthvision)
- [Source code](https://github.com/openmed-labs/synthvision)
- [All SynthVision artifacts](https://huggingface.co/collections/OpenMed/synthvision-69baac655b557943aa1babd3)
- [OpenMed on Hugging Face](https://huggingface.co/OpenMed)
---
许可证: Apache-2.0
任务类别:
- 视觉问答(visual-question-answering)
标签:
- 医学
- SynthVision
- OpenMed
数据量级:
- 10000条 < 样本数 < 100000条
---
# 经Kimi K2.5校验的SynthVision-Qwen数据集

Qwen 3.5标注结果经Kimi K2.5校验,整体校验通过率为93.1%
**总记录数**: 55359条
## 数据集说明
本数据集为[SynthVision流水线](https://huggingface.co/blog/OpenMed/synthvision)的交叉校验子集。Kimi K2.5对全部59476条Qwen 3.5标注结果进行了审核,确认其中55359条与源图像内容一致,校验通过率达93.1%。
校验规则为:`consistent == true` 且 `confidence >= 0.7`。未通过校验的记录已被全部移除,此类未通过记录主要为标注者虚构了图像中不存在的医学发现的案例。
## 数据结构
id: str # 唯一记录标识符
image: str # 图像相对路径
conversations: list[dict] # 多轮ShareGPT格式对话
report: str # 临床叙述文本
structured_findings: dict # 以「异常名称→对应取值」形式组织的结构化医学发现
validation: dict # 包含consistent、confidence、reason字段的校验结果字典
quality_score: float # 综合质量评分
## 数据集加载
python
from datasets import load_dataset
ds = load_dataset("OpenMed/synthvision-validated-qwen-by-kimi")
## 相关链接
- [SynthVision官方博客文章](https://huggingface.co/blog/OpenMed/synthvision)
- [SynthVision源代码仓库](https://github.com/openmed-labs/synthvision)
- [SynthVision全系列相关产物](https://huggingface.co/collections/OpenMed/synthvision-69baac655b557943aa1babd3)
- [Hugging Face平台OpenMed官方主页](https://huggingface.co/OpenMed)
提供机构:
OpenMed



