five

GGPONC 2.1 (Minor Release with Silver Standard Annotations)

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/12520368
下载链接
链接失效反馈
官方服务:
资源简介:
About this Release Version 2.1 (minor release) contains more recent oncology guidelines (XML, plain text), metadata, and predictions using GGPONC 2.0 NER models and UMLS concepts identified with xMEN.  In total, this version contains 32 guidelines with 2.1M tokens of text. If you are looking for the GGPONC 2.0 gold standard annotations, please refer to the major release: https://zenodo.org/records/12518458   Project Description The GGPONC project aims to provide a freely distributable corpus of German medical text for NLP researchers. Clinical guidelines are particularly suitable to create such corpora, as they contain no protected health information (PHI), which distinguishes them from other kinds of medical text. The second version of the corpus (GGPONC 2.0) consists of 30 German oncology guidelines with 1.87 million tokens. It has been completely manually annotated on the entity level by 7 medical students using the INCEpTION platform over a time frame of 6 months in more than 1200 hours of work. This makes GGPONC 2.0 the largest annotated, freely distributable corpus of German medical text at the moment. Annotated entities are Findings (Diagnosis / Pathology, Other Finding), Substances (Clinical Drug, Nutrients / Body Substances, External Substances) and Procedures (Therapeutic, Diagnostic), as well as Specifications for these entities. In total, annotators have created more than 200000 entity annotations. In addition, fragment relationships have been annotated to explicitly indicate elliptical coordinated noun phrases, a common phenomenon in German text.
创建时间:
2024-06-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作