wanglab/eurorad-gpt-oss-training-data

Name: wanglab/eurorad-gpt-oss-training-data
Creator: wanglab
Published: 2026-03-03 14:27:38
License: 暂无描述

Hugging Face2026-03-03 更新2026-04-05 收录

下载链接：

https://hf-mirror.com/datasets/wanglab/eurorad-gpt-oss-training-data

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 task_categories: - text-generation - question-answering language: - en tags: - medical - radiology - diagnosis - clinical-reasoning - eurorad - clinical-decision-support size_categories: - 1K<n<10K --- <div align="center"> **Benchmarking and Adapting On-Device Large Language Models for Clinical Decision Support** </div> <div align="center"> <table align="center"> <tr> <td><a href="[arXiv link - to be added]" target="_blank"><img src="https://img.shields.io/badge/arXiv-Paper-FF6B6B?style=for-the-badge&logo=arxiv&logoColor=white" alt="Paper"></a></td> <td><a href="https://github.com/bowang-lab/on-device-LLM" target="_blank"><img src="https://img.shields.io/badge/GitHub-Code-181717?style=for-the-badge&logo=github&logoColor=white" alt="Code"></a></td> <td><a href="https://huggingface.co/wanglab/on-device-LLM-gpt-oss-20b" target="_blank"><img src="https://img.shields.io/badge/HuggingFace-Model-FFBF00?style=for-the-badge&logo=huggingface&logoColor=white" alt="HuggingFace Model"></a></td> <td><a href="https://huggingface.co/datasets/wanglab/eurorad-gpt-oss-training-data" target="_blank"><img src="https://img.shields.io/badge/HuggingFace-Dataset-28A745?style=for-the-badge&logo=huggingface&logoColor=white" alt="Dataset"></a></td> </tr> </table> </div> ## Authors <a href="https://huggingface.co/alif-munim">Alif Munim</a>* 1, <a href="https://scholar.google.com.hk/citations?hl=en&user=bW1UV4IAAAAJ">Jun Ma</a>* 1,2, <a href="https://huggingface.co/omareng">Omar Ibrahim</a>* 1, Alhusain Abdalla* 1, Shuolin Yin3, Leo Chen4, <a href="https://scholar.google.ca/citations?user=37FDILIAAAAJ&hl=en">Bo Wang</a>† 1,5,6,7,8 * Equal contribution     † Corresponding author 1AI Collaborative Centre, University Health Network, Toronto, Canada 2Princess Margaret Cancer Centre, University Health Network, Toronto, Canada 3Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada 4Division of Urology, Department of Surgery, St. Michael's Hospital, Unity Health Toronto and University of Toronto, Toronto, Canada 5Peter Munk Cardiac Centre, University Health Network, Toronto, Canada 6Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Canada 7Department of Computer Science, University of Toronto, Toronto, Canada 8Vector Institute for Artificial Intelligence, Toronto, Canada ## Highlights - 1,894 EuroRad medical radiology cases with expert-level diagnostic reasoning - Each case enhanced with detailed chain-of-thought reasoning generated by GPT-OSS 120B - Used to fine-tune [wanglab/on-device-LLM-gpt-oss-20b](https://huggingface.co/wanglab/on-device-LLM-gpt-oss-20b) for on-device clinical decision support - Part of a broader benchmark study on adapting on-device LLMs for medical tasks ## Dataset Overview This dataset contains 1,894 medical radiology cases sourced from Eurorad, each enhanced with detailed step-by-step diagnostic reasoning generated by GPT-OSS 120B. It was used to train the LoRA fine-tuned model available at [wanglab/on-device-LLM-gpt-oss-20b](https://huggingface.co/wanglab/on-device-LLM-gpt-oss-20b). Each case includes the clinical history, imaging findings, a differential diagnosis list, the confirmed final diagnosis, and a structured reasoning trace that maps symptoms to differentials and converges on the correct diagnosis. ## Dataset Structure | Field | Type | Description | |-------|------|-------------| | `case_id` | string | Unique case identifier (e.g., case_0001) | | `PostDescription` | string | Clinical history, patient presentation, and imaging findings | | `DifferentialDiagnosisList` | string | List of differential diagnoses considered | | `FinalDiagnosis` | string | Confirmed final diagnosis | | `gptoss120b_reasoning` | string | Step-by-step diagnostic reasoning generated by GPT-OSS 120B | ## Dataset Statistics | | | |--|--| | **Total Cases** | 1,894 | | **Source** | Eurorad (European Society of Radiology) | | **Reasoning Model** | GPT-OSS 120B | | **File Format** | Parquet | | **Language** | English | ## Usage ```python from datasets import load_dataset # Load the dataset dataset = load_dataset("wanglab/eurorad-gpt-oss-training-data") # Access training data train_data = dataset['train'] # Example: print first case print(train_data[0]) ``` ## Citation > Citation will be updated upon arXiv submission and journal publication. ```bibtex @article{munim2025ondevice, title={Benchmarking and Adapting On-Device Large Language Models for Clinical Decision Support}, author={Munim, Alif and Ma, Jun and Ibrahim, Omar and Abdalla, Alhusain and Yin, Shuolin and Chen, Leo and Wang, Bo}, journal={}, year={2025} } ``` ## Limitations - **Medical Validation Required**: Reasoning is AI-generated and requires clinical validation - **Not for Clinical Use**: Dataset is for research and training purposes only - **Specialty Coverage**: May not cover all radiology subspecialties equally - **Reasoning Quality**: AI-generated reasoning may contain errors or hallucinations - **Language**: English only ## Related Resources - **Fine-tuned Model**: [wanglab/on-device-LLM-gpt-oss-20b](https://huggingface.co/wanglab/on-device-LLM-gpt-oss-20b) - **Code Repository**: [github.com/bowang-lab/on-device-LLM](https://github.com/bowang-lab/on-device-LLM) ## Contact For issues and questions, please open a discussion in this repository. Corresponding author: Bo Wang — bowang@vectorinstitute.ai > **Disclaimer**: This dataset is for research purposes only. The diagnostic reasoning is AI-generated and has not been clinically validated. Always consult qualified healthcare professionals for medical decisions.

提供机构：

wanglab

5,000+

优质数据集

54 个

任务类型

进入经典数据集