five

MedReason

收藏
魔搭社区2026-01-06 更新2025-04-26 收录
下载链接:
https://modelscope.cn/datasets/UCSC-VLAA/MedReason
下载链接
链接失效反馈
官方服务:
资源简介:
# MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs <p align="center"> 📃 <a href="https://huggingface.co/papers/2504.00993" target="_blank">Paper</a> |🤗 <a href="https://huggingface.co/UCSC-VLAA/MedReason-8B" target="_blank">MedReason-8B</a> | 📚 <a href="https://huggingface.co/datasets/UCSC-VLAA/MedReason" target="_blank">MedReason Data</a> </p> ## ✨ Latest News - [05/27/2025] 🎉 MedReason wins 3rd prize🏆 in the [Huggingface Reasoning Datasets Competition](https://x.com/bespokelabsai/status/1910068013661118874)! ## ⚡Introduction **MedReason** is a large-scale high-quality medical reasoning dataset designed to enable faithful and explainable medical problem-solving in large language models (LLMs). - We utilize a structured medical knowledge graph (KG) to convert clinical QA pairs into logical chains of reasoning, or “thinking paths”. - Our pipeline generates detailed reasoning for various medical questions from 7 medical datasets, resulting in a dataset of **32,682** question-answer pairs, each with detailed, step-by-step explanations. - By finetuning with proposed [MedReason dataset](https://huggingface.co/datasets/UCSC-VLAA/MedReason), our best model [MedReason-8B](https://huggingface.co/UCSC-VLAA/MedReason-8B), achieves *state-of-the-art* performance. We open-sourced our CoT dataset here. ## 🙏🏼 Acknowledgement We gratefully acknowledge the inspiring work of [HuatuoGPT-o1](https://github.com/FreedomIntelligence/HuatuoGPT-o1), which laid important groundwork for this research. We also thank the developers of the excellent tools [curator](https://github.com/bespokelabsai/curator/), [trl](https://github.com/huggingface/trl), and [sglang](https://github.com/sgl-project/sglang) for making this work possible. ## 📖 Citation ``` @misc{wu2025medreasonelicitingfactualmedical, title={MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs}, author={Juncheng Wu and Wenlong Deng and Xingxuan Li and Sheng Liu and Taomian Mi and Yifan Peng and Ziyang Xu and Yi Liu and Hyunjin Cho and Chang-In Choi and Yihan Cao and Hui Ren and Xiang Li and Xiaoxiao Li and Yuyin Zhou}, year={2025}, eprint={2504.00993}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2504.00993}, } ```

# MedReason:基于知识图谱激发大语言模型的事实性医疗推理步骤 <p align="center"> 📃 <a href="https://huggingface.co/papers/2504.00993" target="_blank">论文</a> |🤗 <a href="https://huggingface.co/UCSC-VLAA/MedReason-8B" target="_blank">MedReason-8B</a> | 📚 <a href="https://huggingface.co/datasets/UCSC-VLAA/MedReason" target="_blank">MedReason数据集</a> </p> ## ✨ 最新动态 - [2025/05/27] 🎉 MedReason在[Huggingface推理数据集竞赛](https://x.com/bespokelabsai/status/1910068013661118874)中斩获三等奖🏆! ## ⚡ 引言 **MedReason**是一款大规模高质量医疗推理数据集,旨在赋能大语言模型(Large Language Model,LLM)实现可信且可解释的医疗问题求解。 - 我们采用结构化医疗知识图谱(Knowledge Graph,KG)将临床问答对转换为逻辑推理链,即“思考路径”。 - 我们的流水线从7个医疗数据集中的各类医疗问题生成详细推理过程,最终构建了包含**32682**条问答对的数据集,每条问答对均配有详尽的分步解释。 - 通过使用本文提出的[MedReason数据集](https://huggingface.co/datasets/UCSC-VLAA/MedReason)进行微调,我们的最优模型[MedReason-8B](https://huggingface.co/UCSC-VLAA/MedReason-8B)实现了当前最优性能。 我们在此开源了我们的思维链(Chain of Thought,CoT)数据集。 ## 🙏🏼 致谢 我们衷心感谢[HuatuoGPT-o1](https://github.com/FreedomIntelligence/HuatuoGPT-o1)的开创性工作,其为本研究奠定了重要基础。同时,我们也感谢优秀工具[curator](https://github.com/bespokelabsai/curator/)、[trl](https://github.com/huggingface/trl)与[sglang](https://github.com/sgl-project/sglang)的开发者,使本研究得以顺利完成。 ## 📖 引用格式 @misc{wu2025medreasonelicitingfactualmedical, title={MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs}, author={Juncheng Wu and Wenlong Deng and Xingxuan Li and Sheng Liu and Taomian Mi and Yifan Peng and Ziyang Xu and Yi Liu and Hyunjin Cho and Chang-In Choi and Yihan Cao and Hui Ren and Xiang Li and Xiaoxiao Li and Yuyin Zhou}, year={2025}, eprint={2504.00993}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2504.00993}, }
提供机构:
maas
创建时间:
2025-04-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作