BioEsCorpus
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/6699942
下载链接
链接失效反馈官方服务:
资源简介:
This folder contains the files and resources obtained in the process of annotating 18 Spanish clinical reports from the Spanish Clinical Case Corpus (SPACCC) (https://doi.org/10.5281/zenodo.2560316) with biomedical entities and semantic relations.
Three annotators had to identify the following eleven types of entities: Gen, Proteína, Glúcido, Lípido, Enfermedad, Síntoma, Signo, Medicamento, Alias, Abreviatura and Sigla.
And the next eight semantic relations: "Implicado en", "Activa", "Inhibe", "Interacciona con", "Previene", "Alivia", "Cura" and "Refiere a".
Finally there were identified 324 entities from ten of the groups of entities, and 170 relations from five of the eight types.
Content:
- brat_annotations: It contains 3 folder, one for each annotator. They contain the eighteen annotations made by the annotator in brat format.
- Clinical_Reports_SPACCC: It contais the 18 original Spanish clinical reports (.txt) from SPACCC.
- Pub_Annotations: It contains 3 folder, one for each annotator. They contain eighteen JSON files with the annotations in PubAnnotation format, which is the original output from TextAE.
- Annotation_guideline_Tool_Usage_Guide.pdf: PDF file which contains firstable a guide in Spanish with indications in how to annotate using TextAE, and secondly the annotation guideline, also in Spanish, provided to the annotators with the indications in how to procee with the annotations.
The scripts employed to produced this data can be found at the GitHub repository: https://github.com/LuciaSG99/BioEsCorpus.git
These resources are freely distributed under a Creative Commons Attribution 4.0 International License.
The author of this project is Lucía Sánchez González, and it has been supervised by Carlos Badenes Olmedo and María Poveda Villalón.
创建时间:
2022-06-23



