mugezhang/medical-temporal-reasoning-sft
收藏Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/mugezhang/medical-temporal-reasoning-sft
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: default
data_files:
- split: train_answer_only
path: sft_train_answer_only.json
- split: train_reasoning
path: sft_train_reasoning.json
- split: val_answer_only
path: sft_val_answer_only.json
- split: val_reasoning
path: sft_val_reasoning.json
features:
- name: item_id
dtype: string
- name: image_path
sequence: string
- name: question
dtype: string
- name: answer
dtype: string
- name: process
dtype: string
- name: dataset_name
dtype: string
---
# Medical Temporal Reasoning — SFT Dataset
SFT training data for a medical temporal reasoning model that, given a current
and prior chest X-ray, produces step-by-step reasoning about disease progression.
Built from MIMIC-CXR image pairs. Images are not included; `image_path` contains
relative paths into the MIMIC-CXR-JPG dataset.
## Splits
| Split | Records | Description |
|---|---|---|
| `train_answer_only` | 158,439 | Training records, answer only (`process=null`) |
| `train_reasoning` | 10,559 | Training records with GPT-mined `<think>` traces |
| `val_answer_only` | 7,933 | Validation records, answer only (`process=null`) |
| `val_reasoning` | 516 | Validation records with `<think>` traces |
The four splits are fully disjoint (no shared `item_id`).
Train/val split is by patient (`subject_id`) — no patient appears in both.
## Sources
| dataset_name | Type | Records |
|---|---|---|
| `mmxu` | MCQ, region-level progression | 90,000 |
| `meddiffvqa` | Open-ended, entity-level change | 85,339 |
| `custom_mcq_ms_cxr_t` | MCQ, disease-level (GPT-generated from MS-CXR-T) | 2,108 |
## Format
Each record:
```json
{
"item_id": "mmxu_12345",
"image_path": ["p17/.../current.jpg", "p17/.../prior.jpg"],
"question": "How has the left lung changed? A) Worsened B) Improved ...",
"answer": "<answer>A</answer>",
"process": "<think>Comparing the two images...</think>",
"dataset_name": "mmxu_answer_only"
}
```
`process` is `null` in answer_only splits and a `<think>...</think>` string in
reasoning splits.
提供机构:
mugezhang



