Silver standard concept annotations from biomedical texts with special relevance to phenotypes

Name: Silver standard concept annotations from biomedical texts with special relevance to phenotypes
Creator: figshare
Published: 2020-09-04 20:12:50
License: 暂无描述

DataCite Commons2020-09-04 更新2024-07-25 收录

下载链接：

https://figshare.com/articles/dataset/Silver_standard_concept_annotations_from_biomedical_texts_with_special_relevance_to_phenotypes/1257838/1

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset documents the results achieved by applying four concept recognition systems to text minining phenotypes from unstructured data. It comprises the individual stand-off annotations created by the systems on four corpora. In addition, for each textual corpus, system-based silver standard corpora have been created using both exact matching, as well as sentence-level matching. The four systems are: * NCBO Annotator * MetaMap * cTAKES * BeCAS The four corpora are: * Pubmed: A corpus of 2,163 publication abstracts from Pubmed * CT_Phenotype: A corpus of 906 clinical trials from http://clinicaltrials.gov/ * i2b2: The i2b2 corpus * SHARE: The ShARE/CLEF e-health 2013 Task 1 testing dataset All annotations are stored in stand-off tab based format in files carrying the names corresponding to the files listed in the original corpus. In the case of the Pubmed and CT_Phentoype corpora, the file names represent Pubmed or Clinical Trials IDs, which can be directly retrived from their original publishers. The stand-off annotation format is: startOffset::endOffset [tab] original text span [tab] list of CUIs separated by comma. Silver standard corpora created from the system annotations ommit the original text span and list only the offsets and the CUIs.

提供机构：

figshare

创建时间：

2016-01-19

5,000+

优质数据集

54 个

任务类型

进入经典数据集