DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries
收藏DataCite Commons2022-04-12 更新2025-04-16 收录
下载链接:
https://physionet.org/content/drugehrqa/
下载链接
链接失效反馈官方服务:
资源简介:
Electronic Health Records (EHR) contain patient records, stored in structured
tables as well as unstructured clinical notes. The information in structured
and unstructured EHR records is not strictly disjoint: information may be
duplicated, contradictory, or provide additional context between these
sources. This presents a rich opportunity to study question answering (QA)
models that combine reasoning over both structured and unstructured data. This
work presents the first question answering (QA) dataset (DrugEHRQA) containing
question-answer pairs from both structured tables and unstructured notes from
MIMIC-III, a publicly available Electronic Health Record (EHR). We are
releasing a QA dataset over MIMIC-III tables through PhysioNet, containing
41,417 triplets of natural language questions, its corresponding SQL query and
the answer retrieved from MIMIC-III tables. We also generated a QA dataset on
the unstructured clinical notes of MIMIC-III which can be found in the n2c2
repository. Both these datasets are combined to generate a multimodal QA
dataset (DrugEHRQA), which contains question-answers from both structured and
unstructured data of MIMIC-III. The DrugEHRQA dataset has medication-related
queries, containing over 70,000 question-answer pairs. Our goal is to provide
a benchmark dataset for multi-modal QA systems, and to open up new avenues of
research in improving question answering over EHR structured data by using
context from unstructured clinical data.
提供机构:
PhysioNet
创建时间:
2021-09-16



