five

Me-LLaMA: Foundation Large Language Models for Medical Applications

收藏
physionet.org2025-01-21 收录
下载链接:
https://physionet.org/content/me-llama/1.0.0/
下载链接
链接失效反馈
官方服务:
资源简介:
Recent advancements in large language models (LLMs) such as ChatGPT and LLaMA have hinted at their potential to revolutionize medical applications, yet their application in clinical settings often reveals limitations due to a lack of specialized training on medical-specific data. In response to this challenge, this study introduces Me-LLaMA, a medical LLM family that includes foundation models – Me-LLaMA 13/70B, along with their chat-enhanced versions – Me-LLaMA 13/70B-chat, developed through continual pre-training and instruction tuning of LLaMA2 using large medical datasets. Our methodology leverages a comprehensive domain-specific data suite, including a large-scale, continual pre-training dataset with 129B tokens, an instruction tuning dataset with 214k samples, and we proposed a new medical evaluation benchmark (MIBE) across six critical medical tasks with 12 datasets. Our extensive evaluation using the MIBE shows that Me-LLaMA models achieve overall better performance than existing open-source medical LLMs in zero-shot, few-shot and supervised learning abilities. With task-specific instruction tuning, Me-LLaMA models outperform ChatGPT on 7 out of 8 datasets and GPT-4 on 5 out of 8 datasets. In addition, we investigated the catastrophic forgetting problem, and our results show that Me-LLaMA models outperform other open-source medical LLMs in mitigating this issue. Me-LLaMA is one of the largest open-source medical foundation LLMs that use both biomedical and clinical data. It exhibits superior performance across both general and medical tasks compared to other open-source medical LLMs, rendering it an attractive choice for medical AI applications.

近年来,大型语言模型(LLM)如ChatGPT和LLaMA的进展预示着它们有望革新医疗应用领域,然而,由于缺乏针对特定医疗数据的专项训练,其在临床环境中的应用往往暴露出局限性。为应对这一挑战,本研究推出了Me-LLaMA,这是一组医疗LLM,包括基础模型——Me-LLaMA 13/70B,以及经过持续预训练和指令调整的聊天增强版本——Me-LLaMA 13/70B-chat,这些模型均通过使用大规模医疗数据集对LLaMA2进行持续预训练和指令调整而开发。我们的方法利用了一个综合的领域特定数据集,包括一个包含1290亿个标记的大规模持续预训练数据集,一个包含21.4万个样本的指令调整数据集,并且我们提出了一种新的医疗评估基准(MIBE),涵盖六个关键医疗任务和12个数据集。我们利用MIBE进行的广泛评估显示,Me-LLaMA模型在零样本、少样本和监督学习能力方面总体上优于现有的开源医疗LLM。通过针对特定任务的指令调整,Me-LLaMA模型在8个数据集中有7个超过了ChatGPT,在8个数据集中有5个超过了GPT-4。此外,我们还研究了灾难性遗忘问题,结果显示,Me-LLaMA模型在减轻该问题方面优于其他开源医疗LLM。Me-LLaMA是最大的开源医疗基础LLM之一,它结合了生物医学和临床数据。与其它开源医疗LLM相比,Me-LLaMA在通用和医疗任务上的表现均更为卓越,使其成为医疗AI应用的理想选择。
提供机构:
physionet.org
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作