five

mugezhang/medical-temporal-reasoning-sft

收藏
Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/mugezhang/medical-temporal-reasoning-sft
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: default data_files: - split: train_answer_only path: sft_train_answer_only.json - split: train_reasoning path: sft_train_reasoning.json - split: val_answer_only path: sft_val_answer_only.json - split: val_reasoning path: sft_val_reasoning.json features: - name: item_id dtype: string - name: image_path sequence: string - name: question dtype: string - name: answer dtype: string - name: process dtype: string - name: dataset_name dtype: string --- # Medical Temporal Reasoning — SFT Dataset SFT training data for a medical temporal reasoning model that, given a current and prior chest X-ray, produces step-by-step reasoning about disease progression. Built from MIMIC-CXR image pairs. Images are not included; `image_path` contains relative paths into the MIMIC-CXR-JPG dataset. ## Splits | Split | Records | Description | |---|---|---| | `train_answer_only` | 158,439 | Training records, answer only (`process=null`) | | `train_reasoning` | 10,559 | Training records with GPT-mined `<think>` traces | | `val_answer_only` | 7,933 | Validation records, answer only (`process=null`) | | `val_reasoning` | 516 | Validation records with `<think>` traces | The four splits are fully disjoint (no shared `item_id`). Train/val split is by patient (`subject_id`) — no patient appears in both. ## Sources | dataset_name | Type | Records | |---|---|---| | `mmxu` | MCQ, region-level progression | 90,000 | | `meddiffvqa` | Open-ended, entity-level change | 85,339 | | `custom_mcq_ms_cxr_t` | MCQ, disease-level (GPT-generated from MS-CXR-T) | 2,108 | ## Format Each record: ```json { "item_id": "mmxu_12345", "image_path": ["p17/.../current.jpg", "p17/.../prior.jpg"], "question": "How has the left lung changed? A) Worsened B) Improved ...", "answer": "<answer>A</answer>", "process": "<think>Comparing the two images...</think>", "dataset_name": "mmxu_answer_only" } ``` `process` is `null` in answer_only splits and a `<think>...</think>` string in reasoning splits.
提供机构:
mugezhang
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作