GPTNERMED
收藏arXiv2022-08-31 更新2024-06-21 收录
下载链接:
https://github.com/frankkramer-lab/GPTNERMED
下载链接
链接失效反馈官方服务:
资源简介:
GPTNERMED是一个专为德语医学文本设计的自定义数据集,由奥格斯堡大学信息科学系的Johann Frei和Frank Kramer创建。该数据集通过预训练语言模型生成,用于训练医学命名实体识别(NER)模型。数据集包含245107个带注释的令牌,主要涉及药物、剂量和诊断三类实体。创建过程中,研究者利用了预训练语言模型的少样本学习能力,通过简单的标记语言设计输入提示,生成带注释的文本数据。GPTNERMED数据集主要应用于德语医学文本的NLP任务,旨在解决非英语医学NLP领域中数据集和预训练模型的缺乏问题。
GPTNERMED is a custom dataset specifically designed for German medical texts, created by Johann Frei and Frank Kramer from the Department of Information Science, University of Augsburg. This dataset is generated using pretrained language models, and is intended for training medical named entity recognition (NER) models. It contains 245,107 annotated tokens, covering three main entity categories: drugs, dosages, and diagnoses. During its creation, the researchers leveraged the few-shot learning capabilities of pretrained language models, designing input prompts with simple markup languages to generate annotated text data. The GPTNERMED dataset is primarily applied to NLP tasks for German medical texts, aiming to address the shortage of datasets and pretrained models in the non-English medical NLP domain.
提供机构:
奥格斯堡大学信息科学系
创建时间:
2022-08-31



