Asclepius-R : Clinical Large Language Model Built On MIMIC-III Discharge Summaries
收藏DataCite Commons2024-03-26 更新2024-07-13 收录
下载链接:
https://physionet.org/content/asclepius-r/1.0.1/
下载链接
链接失效反馈官方服务:
资源简介:
The development of large language models tailored for handling patients'
clinical notes is often hindered by the limited accessibility and usability of
these notes due to strict privacy regulations. To address these challenges, we
first create synthetic large-scale clinical notes using publicly available
case reports extracted from biomedical literature. We then use these synthetic
notes to train our specialized clinical large language model, Asclepius. While
Asclepius is trained on synthetic data, we assess its potential performance in
real-world applications by evaluating it using real clinical notes. We
benchmark Asclepius against several other large language models, including
GPT-3.5-turbo and other open-source alternatives. To further validate our
approach using synthetic notes, we also compare Asclepius with its variants
trained on real clinical notes. Our findings convincingly demonstrate that
synthetic clinical notes can serve as viable substitutes for real ones when
constructing high-performing clinical language models. This conclusion is
supported by detailed evaluations conducted by both GPT-4 and medical
professionals. All resources--including weights, codes, and data--used in the
development of Asclepius are made publicly accessible for future research.
Specifically, this repository contains Asclepius-R, a variant of Asclepius
that was trained on MIMIC-III discharge summaries. All other resource are also
publicly accessible.
针对患者临床病历(Clinical Notes)开发专用大语言模型(Large Language Model,LLM)的工作,常因严格的隐私监管规定导致此类病历的可获取性与可用性受限,进而遭遇研发瓶颈。为解决上述难题,本研究首先从生物医学文献中提取公开病例报告,以此构建大规模合成临床病历。随后,我们使用该合成临床病历训练自研的专用临床大语言模型Asclepius。尽管Asclepius基于合成数据训练完成,我们仍采用真实临床病历对其进行评测,以评估其在真实世界场景中的应用潜力。我们将Asclepius与包括GPT-3.5-turbo在内的多款其他大语言模型及开源替代模型开展基准测试对比。为进一步验证基于合成临床病历的研发方案的有效性,我们还将Asclepius与基于真实临床病历训练的同模型变体进行了对比。本研究结果确凿表明,在构建高性能临床语言模型时,合成临床病历可作为真实病历的可行替代方案。该结论得到了GPT-4与医学专业人士的详细评测结果的佐证。本研究中用于开发Asclepius的全部资源(包括模型权重、代码与数据集)均已对外开放,以供后续研究使用。具体而言,本代码仓库包含Asclepius-R——一款基于MIMIC-III出院小结训练的Asclepius变体。其余所有资源同样可公开获取。
提供机构:
PhysioNet
创建时间:
2024-01-30



