Learning to Ask Like a Physician: a Discharge Summary Clinical Questions (DiSCQ) Dataset
收藏DataCite Commons2022-07-28 更新2025-04-16 收录
下载链接:
https://physionet.org/content/discq/
下载链接
链接失效反馈官方服务:
资源简介:
Existing question answering (QA) datasets derived from electronic health
records (EHR) are artificially generated and consequently fail to capture
realistic physician information needs. We present Discharge Summary Clinical
Questions (DiSCQ), a newly curated question dataset composed of 2,000+
questions paired with the snippets of text (triggers) that prompted each
question. The questions are generated by medical experts from 100+ MIMIC-III,
version 1.4, discharge summaries. These discharge summaries overlap with the
n2c2 challenge, so they are filled in with surrogate PHI. We analyze this
dataset to characterize the types of information sought by medical experts. We
also train baseline models for trigger detection and question generation (QG),
paired with unsupervised answer retrieval over EHRs. Our baseline model is
able to generate high quality questions in over 62% of cases when prompted
with human selected triggers. We release this dataset (and a link to all code
to reproduce baseline model results) to facilitate further research into
realistic clinical QA and QG.
提供机构:
PhysioNet
创建时间:
2022-07-28



