PMC-Patients
收藏arXiv2025-09-30 收录
下载链接:
https://doi.org/10.1038/s41597-023-02814-8
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了用于生成美国医学执照考试(USMLE)风格问题的未识别患者概要。此外,患者概要的平均长度约为419个单词。该数据集的规模较大,包含了多个条目以供问题生成,其中373条来自人工标注集,385条来自由大型语言模型生成的集。任务目标是针对USMLE进行问题生成。
This dataset comprises de-identified patient summaries intended for generating United States Medical Licensing Examination (USMLE)-style questions. The average length of each patient summary is approximately 419 words. This large-scale dataset includes multiple entries for USMLE-style question generation: 373 entries are sourced from a manually annotated dataset, and 385 entries are derived from a corpus generated by large language models. The core task of this dataset is to perform question generation tailored for the USMLE examination.



