MiRoR11 - P2 - Annotated corpus for primary and reported outcomes extraction
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/3234810
下载链接
链接失效反馈官方服务:
资源简介:
Annotated corpus for outcome extraction
This folder contains 2 subfolders:
1. Primary_outcomes - a corpus annotated for primary outcomes
The folder contains the following files:
po_sent_marked_p1_1000.txt - sentences 1 - 1000 of the annotated corpus, ConstruKT format; coordinated outcomes annotated as single entity
po_sent_marked_p2_1000.txt - sentences 1001 - 2000 of the annotated corpus, ConstruKT format; coordinated outcomes annotated as single entity
po_sent_marked_col_p1.txt - sentences 1 - 1000 of the annotated corpus, tabulated format; coordinated outcomes annotated as single entity
po_sent_marked_col_p2.txt - sentences 1001 - 2000 of the annotated corpus, tabulated format; coordinated outcomes annotated as single entity
po_sent_marked_col_p1_coord.txt - sentences 1 - 1000 of the annotated corpus, tabulated format; coordinated outcomes annotated as separate entities
po_sent_marked_col_p2_coord.txt - sentences 1001 - 2000 of the annotated corpus, tabulated format; coordinated outcomes annotated as separate entities
Subfolders:
po - the corpus for 10-fold cross-validation (10 subfolders with train/dev/test sets); coordinated outcomes annotated as separate entities
po_coord - the corpus for 10-fold cross-validation (10 subfolders with train/dev/test sets); coordinated outcomes annotated as separate entities
2. Reported_outcomes - a corpus annotated for reported outcomes
The corpus contains sentences from Results and Conclusions sections of the articles for which primary outcomes were annotated. The first part of reported outcomes corpus contains sentences from articles for the first half of the primary outcomes corpus (sentences 1 - 1000). The second part of reported outcomes corpus contains sentences from articles for the second half of the primary outcomes corpus (sentences 1001 - 2000).
The folder contains the following files:
res_sent_marked_p1.txt - first part of the annotated corpus, ConstruKT format
res_sent_marked_p2.txt - second part of the annotated corpus, ConstruKT format
res_sent_marked_p1_col.txt - first part of the annotated corpus, tabulated format
res_sent_marked_p2_col.txt - first part of the annotated corpus, tabulated format
Subfolders:
rep - the corpus for 10-fold cross-validation (10 subfolders with train/dev/test sets)
创建时间:
2020-02-17



