nsk7153/MedCalc-Bench
收藏Hugging Face2024-06-14 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/nsk7153/MedCalc-Bench
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-sa-4.0
dataset_info:
features:
- name: Row Number
dtype: int64
- name: Calculator ID
dtype: int64
- name: Calculator Name
dtype: string
- name: Category
dtype: string
- name: Output Type
dtype: string
- name: Note ID
dtype: string
- name: Note Type
dtype: string
- name: Patient Note
dtype: string
- name: Question
dtype: string
- name: Relevant Entities
dtype: string
- name: Ground Truth Answer
dtype: string
- name: Lower Limit
dtype: string
- name: Upper Limit
dtype: string
- name: Ground Truth Explanation
dtype: string
splits:
- name: train
num_bytes: 41265307
num_examples: 10053
- name: test
num_bytes: 4038342
num_examples: 1047
download_size: 19626866
dataset_size: 45303649
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: test
path: data/test-*
---
MedCalc-Bench is the first medical calculation dataset used to benchmark LLMs ability to serve as clinical calculators.
Each instance in the dataset consists of a patient note, a question asking to compute a specific clinical value, an final answer value,
and a step-by-step solution explaining how the final answer was obtained. Our dataset covers 55 different calculation tasks.
We hope this dataset serves as a call to improve the verbal and computational reasoning skills of LLMs in medical settings.
This dataset contains a training dataset of 10.1k instances and a testing dataset of 1047 instances.
## Contents inside the Training and Testing CSV
In total, there are 1047 instances. Each row in the dataset contains the following information
Calculator Name, Note ID, Note Type (Extracted (if note is from Open-Patients), Synthetic (if note is handwritten by clinician),
or Template (if note is generated using a template)), Relevant Entities needed for calculation, Ground Truth Answer, Lower Limit, Upper Limit, Ground Truth Explanation.
Note that for equation-based calculators whose output is a decimal, the values for the Upper and Lower Limit will be +/- 0.05%
of the ground truth answer. If the LLM's final answer value is between the upper and lower limit, the answer is considered correct.
For all other instances, the Upper Limit and Lower Limit are set to the same value as the ground truth. We make error accomodation for equation-based
LLM calculators to accomodate for any rounding differences in intermediate steps. This issue is nonexistant for date-based equation calculators and rule-based calculators where
final value is independent any rounding done in the intermediate steps.
## How to Use MedCalc-Bench
The training dataset of MedCalc-Bench can be used for fine-tunining LLMs. We have provided both the fine-tuned models and code for fine-tuning at our repository link: https://github.com/ncbi-nlp/MedCalc-Bench.
The test set of MedCalc-Bench is helpful for benchamrking LLMs under different settings. We provide instructions in the README of our repository for how to reproduce all of our results for all of the models using the different prompt settings.
By experimenting with different LLMs and prompts, we hope our dataset demonstrates the potential and limitations of LLMs in different settings.
## License
Both the training and testing dataset of MedCalc-Bench are released under the CC-BY-SA 4.0 license.
提供机构:
nsk7153
原始信息汇总
数据集概述
数据集信息
-
特征列表:
Row Number: 数据类型为int64Calculator ID: 数据类型为int64Calculator Name: 数据类型为stringCategory: 数据类型为stringOutput Type: 数据类型为stringNote ID: 数据类型为stringNote Type: 数据类型为stringPatient Note: 数据类型为stringQuestion: 数据类型为stringRelevant Entities: 数据类型为stringGround Truth Answer: 数据类型为stringLower Limit: 数据类型为stringUpper Limit: 数据类型为stringGround Truth Explanation: 数据类型为string
-
数据分割:
train: 包含 10053 个样本,大小为 41265307 字节test: 包含 1047 个样本,大小为 4038342 字节
-
数据集大小:
- 下载大小:19626866 字节
- 数据集总大小:45303649 字节
数据集配置
- 默认配置:
train数据文件路径:data/train-*test数据文件路径:data/test-*
数据集内容
-
训练和测试数据集:
- 训练集包含 10100 个实例
- 测试集包含 1047 个实例
- 每个实例包含以下信息:
Calculator NameNote IDNote Type(可能为Extracted、Synthetic或Template)Relevant EntitiesGround Truth AnswerLower LimitUpper LimitGround Truth Explanation
-
特殊说明:
- 对于输出为小数的基于方程的计算器,上下限为真实答案的 +/- 0.05%
- 对于其他实例,上下限与真实答案相同
数据集用途
- 训练数据集:用于微调大型语言模型(LLMs)
- 测试数据集:用于在不同设置下基准测试大型语言模型(LLMs)
许可证
- 数据集发布在 CC-BY-SA 4.0 许可证下



