five

wanglab/eurorad-gpt-oss-training-data

收藏
Hugging Face2026-03-03 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/wanglab/eurorad-gpt-oss-training-data
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - text-generation - question-answering language: - en tags: - medical - radiology - diagnosis - clinical-reasoning - eurorad - clinical-decision-support size_categories: - 1K<n<10K --- <div align="center"> **Benchmarking and Adapting On-Device Large Language Models for Clinical Decision Support** </div> <div align="center"> <table align="center"> <tr> <td><a href="[arXiv link - to be added]" target="_blank"><img src="https://img.shields.io/badge/arXiv-Paper-FF6B6B?style=for-the-badge&logo=arxiv&logoColor=white" alt="Paper"></a></td> <td><a href="https://github.com/bowang-lab/on-device-LLM" target="_blank"><img src="https://img.shields.io/badge/GitHub-Code-181717?style=for-the-badge&logo=github&logoColor=white" alt="Code"></a></td> <td><a href="https://huggingface.co/wanglab/on-device-LLM-gpt-oss-20b" target="_blank"><img src="https://img.shields.io/badge/HuggingFace-Model-FFBF00?style=for-the-badge&logo=huggingface&logoColor=white" alt="HuggingFace Model"></a></td> <td><a href="https://huggingface.co/datasets/wanglab/eurorad-gpt-oss-training-data" target="_blank"><img src="https://img.shields.io/badge/HuggingFace-Dataset-28A745?style=for-the-badge&logo=huggingface&logoColor=white" alt="Dataset"></a></td> </tr> </table> </div> ## Authors <p align="center"> <a href="https://huggingface.co/alif-munim">Alif Munim</a><sup>* 1</sup>, <a href="https://scholar.google.com.hk/citations?hl=en&user=bW1UV4IAAAAJ">Jun Ma</a><sup>* 1,2</sup>, <a href="https://huggingface.co/omareng">Omar Ibrahim</a><sup>* 1</sup>, <b>Alhusain Abdalla</b><sup>* 1</sup>, Shuolin Yin<sup>3</sup>, Leo Chen<sup>4</sup>, <a href="https://scholar.google.ca/citations?user=37FDILIAAAAJ&hl=en">Bo Wang</a><sup>† 1,5,6,7,8</sup> </p> <p align="center"> <sup>*</sup> Equal contribution &nbsp;&nbsp;&nbsp; <sup>†</sup> Corresponding author </p> <p align="center"> <sup>1</sup>AI Collaborative Centre, University Health Network, Toronto, Canada<br> <sup>2</sup>Princess Margaret Cancer Centre, University Health Network, Toronto, Canada<br> <sup>3</sup>Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada<br> <sup>4</sup>Division of Urology, Department of Surgery, St. Michael's Hospital, Unity Health Toronto and University of Toronto, Toronto, Canada<br> <sup>5</sup>Peter Munk Cardiac Centre, University Health Network, Toronto, Canada<br> <sup>6</sup>Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Canada<br> <sup>7</sup>Department of Computer Science, University of Toronto, Toronto, Canada<br> <sup>8</sup>Vector Institute for Artificial Intelligence, Toronto, Canada </p> ## Highlights - 1,894 EuroRad medical radiology cases with expert-level diagnostic reasoning - Each case enhanced with detailed chain-of-thought reasoning generated by GPT-OSS 120B - Used to fine-tune [wanglab/on-device-LLM-gpt-oss-20b](https://huggingface.co/wanglab/on-device-LLM-gpt-oss-20b) for on-device clinical decision support - Part of a broader benchmark study on adapting on-device LLMs for medical tasks ## Dataset Overview This dataset contains 1,894 medical radiology cases sourced from Eurorad, each enhanced with detailed step-by-step diagnostic reasoning generated by GPT-OSS 120B. It was used to train the LoRA fine-tuned model available at [wanglab/on-device-LLM-gpt-oss-20b](https://huggingface.co/wanglab/on-device-LLM-gpt-oss-20b). Each case includes the clinical history, imaging findings, a differential diagnosis list, the confirmed final diagnosis, and a structured reasoning trace that maps symptoms to differentials and converges on the correct diagnosis. ## Dataset Structure | Field | Type | Description | |-------|------|-------------| | `case_id` | string | Unique case identifier (e.g., case_0001) | | `PostDescription` | string | Clinical history, patient presentation, and imaging findings | | `DifferentialDiagnosisList` | string | List of differential diagnoses considered | | `FinalDiagnosis` | string | Confirmed final diagnosis | | `gptoss120b_reasoning` | string | Step-by-step diagnostic reasoning generated by GPT-OSS 120B | ## Dataset Statistics | | | |--|--| | **Total Cases** | 1,894 | | **Source** | Eurorad (European Society of Radiology) | | **Reasoning Model** | GPT-OSS 120B | | **File Format** | Parquet | | **Language** | English | ## Usage ```python from datasets import load_dataset # Load the dataset dataset = load_dataset("wanglab/eurorad-gpt-oss-training-data") # Access training data train_data = dataset['train'] # Example: print first case print(train_data[0]) ``` ## Citation > Citation will be updated upon arXiv submission and journal publication. ```bibtex @article{munim2025ondevice, title={Benchmarking and Adapting On-Device Large Language Models for Clinical Decision Support}, author={Munim, Alif and Ma, Jun and Ibrahim, Omar and Abdalla, Alhusain and Yin, Shuolin and Chen, Leo and Wang, Bo}, journal={}, year={2025} } ``` ## Limitations - **Medical Validation Required**: Reasoning is AI-generated and requires clinical validation - **Not for Clinical Use**: Dataset is for research and training purposes only - **Specialty Coverage**: May not cover all radiology subspecialties equally - **Reasoning Quality**: AI-generated reasoning may contain errors or hallucinations - **Language**: English only ## Related Resources - **Fine-tuned Model**: [wanglab/on-device-LLM-gpt-oss-20b](https://huggingface.co/wanglab/on-device-LLM-gpt-oss-20b) - **Code Repository**: [github.com/bowang-lab/on-device-LLM](https://github.com/bowang-lab/on-device-LLM) ## Contact For issues and questions, please open a discussion in this repository. Corresponding author: Bo Wang — bowang@vectorinstitute.ai > **Disclaimer**: This dataset is for research purposes only. The diagnostic reasoning is AI-generated and has not been clinically validated. Always consult qualified healthcare professionals for medical decisions.
提供机构:
wanglab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作