five

MEDDOCAN corpus: gold standard annotations for Medical Document Anonymization on Spanish clinical case reports

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/4279322
下载链接
链接失效反馈
官方服务:
资源简介:
Intro: Meddocan shared task dataset (divided in train, dev and test). In addition, we include here the Meddocan background set. It contains the training, development and test sets of the Meddocan shared task with Gold Standard annotations. In addition, it contains the documents of the background set, without annotations.   Annotation quality Inter-annotator agreement: 98%  For more information, see the paper.    Format: Annotations are distributed in Brat format. See Brat webpage for more information. In addition, annotations are also distributed in XML format (based on i2b2 XML format). In the Meddocan webpage, there is a script to convert between MEDDOCAN-Brat, MEDDOCAN-XML, and i2b2 formats.   Shared task goal: In the three subtasks, the goal will be to predict the annotations given only the plain text files.    Resources: Web Citation: Montserrat Marimon et al. “Automatic De-identification of Medical Texts in Spanish: the MEDDOCAN Track, Corpus, Guidelines, Methods and Evaluation of Results.” In: IberLEF@ SEPLN. 2019, pp. 618–638. Silver Standard corpus Annotation guidelines   For further information, please visit https://temu.bsc.es/meddocan/ or email us at encargo-pln-life@bsc.es Copyright (c) 2019 Secretaría de Estado para el Avance Digital (SEAD)
创建时间:
2022-11-04
二维码
社区交流群
二维码
科研交流群
商业服务