Silver standard concept annotations from biomedical texts with special relevance to phenotypes
收藏DataCite Commons2020-09-04 更新2024-07-25 收录
下载链接:
https://figshare.com/articles/dataset/Silver_standard_concept_annotations_from_biomedical_texts_with_special_relevance_to_phenotypes/1257838/1
下载链接
链接失效反馈官方服务:
资源简介:
This dataset documents the results achieved by applying four concept recognition systems to text minining phenotypes from unstructured data. It comprises the individual stand-off annotations created by the systems on four corpora. In addition, for each textual corpus, system-based silver standard corpora have been created using both exact matching, as well as sentence-level matching. The four systems are: * NCBO Annotator * MetaMap * cTAKES * BeCAS The four corpora are: * Pubmed: A corpus of 2,163 publication abstracts from Pubmed * CT_Phenotype: A corpus of 906 clinical trials from http://clinicaltrials.gov/ * i2b2: The i2b2 corpus * SHARE: The ShARE/CLEF e-health 2013 Task 1 testing dataset All annotations are stored in stand-off tab based format in files carrying the names corresponding to the files listed in the original corpus. In the case of the Pubmed and CT_Phentoype corpora, the file names represent Pubmed or Clinical Trials IDs, which can be directly retrived from their original publishers. The stand-off annotation format is: startOffset::endOffset [tab] original text span [tab] list of CUIs separated by comma. Silver standard corpora created from the system annotations ommit the original text span and list only the offsets and the CUIs.
提供机构:
figshare
创建时间:
2016-01-19



