wanglab/eurorad-gpt-oss-training-data
收藏Hugging Face2026-03-03 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/wanglab/eurorad-gpt-oss-training-data
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- text-generation
- question-answering
language:
- en
tags:
- medical
- radiology
- diagnosis
- clinical-reasoning
- eurorad
- clinical-decision-support
size_categories:
- 1K<n<10K
---
<div align="center">
**Benchmarking and Adapting On-Device Large Language Models for Clinical Decision Support**
</div>
<div align="center">
<table align="center">
<tr>
<td><a href="[arXiv link - to be added]" target="_blank"><img src="https://img.shields.io/badge/arXiv-Paper-FF6B6B?style=for-the-badge&logo=arxiv&logoColor=white" alt="Paper"></a></td>
<td><a href="https://github.com/bowang-lab/on-device-LLM" target="_blank"><img src="https://img.shields.io/badge/GitHub-Code-181717?style=for-the-badge&logo=github&logoColor=white" alt="Code"></a></td>
<td><a href="https://huggingface.co/wanglab/on-device-LLM-gpt-oss-20b" target="_blank"><img src="https://img.shields.io/badge/HuggingFace-Model-FFBF00?style=for-the-badge&logo=huggingface&logoColor=white" alt="HuggingFace Model"></a></td>
<td><a href="https://huggingface.co/datasets/wanglab/eurorad-gpt-oss-training-data" target="_blank"><img src="https://img.shields.io/badge/HuggingFace-Dataset-28A745?style=for-the-badge&logo=huggingface&logoColor=white" alt="Dataset"></a></td>
</tr>
</table>
</div>
## Authors
<p align="center">
<a href="https://huggingface.co/alif-munim">Alif Munim</a><sup>* 1</sup>,
<a href="https://scholar.google.com.hk/citations?hl=en&user=bW1UV4IAAAAJ">Jun Ma</a><sup>* 1,2</sup>,
<a href="https://huggingface.co/omareng">Omar Ibrahim</a><sup>* 1</sup>,
<b>Alhusain Abdalla</b><sup>* 1</sup>,
Shuolin Yin<sup>3</sup>,
Leo Chen<sup>4</sup>,
<a href="https://scholar.google.ca/citations?user=37FDILIAAAAJ&hl=en">Bo Wang</a><sup>† 1,5,6,7,8</sup>
</p>
<p align="center">
<sup>*</sup> Equal contribution <sup>†</sup> Corresponding author
</p>
<p align="center">
<sup>1</sup>AI Collaborative Centre, University Health Network, Toronto, Canada<br>
<sup>2</sup>Princess Margaret Cancer Centre, University Health Network, Toronto, Canada<br>
<sup>3</sup>Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada<br>
<sup>4</sup>Division of Urology, Department of Surgery, St. Michael's Hospital, Unity Health Toronto and University of Toronto, Toronto, Canada<br>
<sup>5</sup>Peter Munk Cardiac Centre, University Health Network, Toronto, Canada<br>
<sup>6</sup>Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Canada<br>
<sup>7</sup>Department of Computer Science, University of Toronto, Toronto, Canada<br>
<sup>8</sup>Vector Institute for Artificial Intelligence, Toronto, Canada
</p>
## Highlights
- 1,894 EuroRad medical radiology cases with expert-level diagnostic reasoning
- Each case enhanced with detailed chain-of-thought reasoning generated by GPT-OSS 120B
- Used to fine-tune [wanglab/on-device-LLM-gpt-oss-20b](https://huggingface.co/wanglab/on-device-LLM-gpt-oss-20b) for on-device clinical decision support
- Part of a broader benchmark study on adapting on-device LLMs for medical tasks
## Dataset Overview
This dataset contains 1,894 medical radiology cases sourced from Eurorad, each enhanced with detailed step-by-step diagnostic reasoning generated by GPT-OSS 120B. It was used to train the LoRA fine-tuned model available at [wanglab/on-device-LLM-gpt-oss-20b](https://huggingface.co/wanglab/on-device-LLM-gpt-oss-20b).
Each case includes the clinical history, imaging findings, a differential diagnosis list, the confirmed final diagnosis, and a structured reasoning trace that maps symptoms to differentials and converges on the correct diagnosis.
## Dataset Structure
| Field | Type | Description |
|-------|------|-------------|
| `case_id` | string | Unique case identifier (e.g., case_0001) |
| `PostDescription` | string | Clinical history, patient presentation, and imaging findings |
| `DifferentialDiagnosisList` | string | List of differential diagnoses considered |
| `FinalDiagnosis` | string | Confirmed final diagnosis |
| `gptoss120b_reasoning` | string | Step-by-step diagnostic reasoning generated by GPT-OSS 120B |
## Dataset Statistics
| | |
|--|--|
| **Total Cases** | 1,894 |
| **Source** | Eurorad (European Society of Radiology) |
| **Reasoning Model** | GPT-OSS 120B |
| **File Format** | Parquet |
| **Language** | English |
## Usage
```python
from datasets import load_dataset
# Load the dataset
dataset = load_dataset("wanglab/eurorad-gpt-oss-training-data")
# Access training data
train_data = dataset['train']
# Example: print first case
print(train_data[0])
```
## Citation
> Citation will be updated upon arXiv submission and journal publication.
```bibtex
@article{munim2025ondevice,
title={Benchmarking and Adapting On-Device Large Language Models for Clinical Decision Support},
author={Munim, Alif and Ma, Jun and Ibrahim, Omar and Abdalla, Alhusain and Yin, Shuolin and Chen, Leo and Wang, Bo},
journal={},
year={2025}
}
```
## Limitations
- **Medical Validation Required**: Reasoning is AI-generated and requires clinical validation
- **Not for Clinical Use**: Dataset is for research and training purposes only
- **Specialty Coverage**: May not cover all radiology subspecialties equally
- **Reasoning Quality**: AI-generated reasoning may contain errors or hallucinations
- **Language**: English only
## Related Resources
- **Fine-tuned Model**: [wanglab/on-device-LLM-gpt-oss-20b](https://huggingface.co/wanglab/on-device-LLM-gpt-oss-20b)
- **Code Repository**: [github.com/bowang-lab/on-device-LLM](https://github.com/bowang-lab/on-device-LLM)
## Contact
For issues and questions, please open a discussion in this repository.
Corresponding author: Bo Wang — bowang@vectorinstitute.ai
> **Disclaimer**: This dataset is for research purposes only. The diagnostic reasoning is AI-generated and has not been clinically validated. Always consult qualified healthcare professionals for medical decisions.
提供机构:
wanglab



