nsk7153/MedCalc-Bench-Verified
收藏Hugging Face2025-12-19 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/nsk7153/MedCalc-Bench-Verified
下载链接
链接失效反馈官方服务:
资源简介:
MedCalc-Bench Verified是一个重新验证的版本,用于评估大型语言模型(LLMs)作为临床计算器的能力。数据集中的每个实例包含一个患者笔记、一个要求计算特定临床值的问题、一个最终答案值以及一个逐步解决方案,解释如何获得最终答案。我们的数据集涵盖了55种不同的计算任务,这些任务要么是基于规则的计算,要么是基于方程的计算。该数据集包含10,538个训练实例和1,100个测试实例。我们希望我们的数据集和基准能够作为一个呼吁,提高LLMs在医疗环境中的计算推理能力。
MedCalc-Bench Verified is a re-verified version of MedCalc-Bench used to benchmark LLMs ability to serve as clinical calculators. Each instance in the dataset consists of a patient note, a question asking to compute a specific clinical value, a final answer value, and a step-by-step solution explaining how the final answer was obtained. Our dataset covers 55 different calculation tasks which are either rule-based calculations or are equation-based calculations. This dataset contains a training dataset of 10,538 instances and a testing dataset of 1,100 instances. In all, we hope that our dataset and benchmark serves as a call to improve the computational reasoning skills of LLMs in medical settings.
提供机构:
nsk7153



