five

MIMIC-IV-Ext Clinical Decision Making: A MIMIC-IV Derived Dataset for Evaluation of Large Language Models on the Task of Clinical Decision Making for Abdominal Pathologies

收藏
physionet.org2025-03-22 收录
下载链接:
https://physionet.org/content/mimic-iv-ext-cdm/1.0/
下载链接
链接失效反馈
官方服务:
资源简介:
Clinical decision making is one of the most impactful parts of a physician's responsibilities and stands to benefit greatly from AI solutions such as large language models (LLMs). However, while many datasets exist to test the performance of AI models on constructed case vignettes, such as medical licensing exams, these tests fail to assess many skills that are necessary for deployment in a realistic clinical decision making environment. To understand how useful LLMs are in real-world settings, we must evaluate them in the wild, i.e. on real-world data under realistic conditions. To address this need, we have created a curated dataset based on the MIMIC-IV database, spanning 2400 real patient cases and four common abdominal pathologies: appendicitis, cholecystitis, diverticulitis, and pancreatitis. Each patient case contains the filtered and curated information necessary to arrive at the delivered diagnosis of the physician and can be used in an interactive manner to test the information gathering, synthesizing, and diagnostic capabilities of AI models.

临床决策是医生职责中影响最为深远的部分,而大型语言模型(LLM)等人工智能解决方案的应用将为这一领域带来巨大的益处。尽管目前存在众多数据集用于测试人工智能模型在构建案例片段(如医学执照考试)上的性能,但这些测试并未能全面评估在实际临床决策环境中所需的多项技能。为了解大型语言模型在现实世界中的应用价值,我们必须在真实世界的数据和实际条件下对其进行分析评估。为此,我们基于MIMIC-IV数据库构建了一个精选的数据集,涵盖了2400例真实患者的病例以及四种常见的腹部疾病:阑尾炎、胆囊炎、憩室炎和胰腺炎。每个病例均包含了医生作出诊断所需的过滤和精选信息,并可用于交互式测试,以检验人工智能模型在信息收集、综合分析和诊断能力方面的表现。
提供机构:
physionet.org
二维码
社区交流群
二维码
科研交流群
商业服务