five

DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries

收藏
DataCite Commons2022-04-12 更新2025-04-16 收录
下载链接:
https://physionet.org/content/drugehrqa/
下载链接
链接失效反馈
官方服务:
资源简介:
Electronic Health Records (EHR) contain patient records, stored in structured tables as well as unstructured clinical notes. The information in structured and unstructured EHR records is not strictly disjoint: information may be duplicated, contradictory, or provide additional context between these sources. This presents a rich opportunity to study question answering (QA) models that combine reasoning over both structured and unstructured data. This work presents the first question answering (QA) dataset (DrugEHRQA) containing question-answer pairs from both structured tables and unstructured notes from MIMIC-III, a publicly available Electronic Health Record (EHR). We are releasing a QA dataset over MIMIC-III tables through PhysioNet, containing 41,417 triplets of natural language questions, its corresponding SQL query and the answer retrieved from MIMIC-III tables. We also generated a QA dataset on the unstructured clinical notes of MIMIC-III which can be found in the n2c2 repository. Both these datasets are combined to generate a multimodal QA dataset (DrugEHRQA), which contains question-answers from both structured and unstructured data of MIMIC-III. The DrugEHRQA dataset has medication-related queries, containing over 70,000 question-answer pairs. Our goal is to provide a benchmark dataset for multi-modal QA systems, and to open up new avenues of research in improving question answering over EHR structured data by using context from unstructured clinical data.
提供机构:
PhysioNet
创建时间:
2021-09-16
二维码
社区交流群
二维码
科研交流群
商业服务