five

Silver standard concept annotations from biomedical texts with special relevance to phenotypes

收藏
DataCite Commons2020-09-04 更新2024-07-25 收录
下载链接:
https://figshare.com/articles/dataset/Silver_standard_concept_annotations_from_biomedical_texts_with_special_relevance_to_phenotypes/1257838/1
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset documents the results achieved by applying four concept recognition systems to text minining phenotypes from unstructured data. It comprises the individual stand-off annotations created by the systems on four corpora. In addition, for each textual corpus, system-based silver standard corpora have been created using both exact matching, as well as sentence-level matching. The four systems are: * NCBO Annotator * MetaMap * cTAKES * BeCAS The four corpora are: * Pubmed: A corpus of 2,163 publication abstracts from Pubmed * CT_Phenotype: A corpus of 906 clinical trials from http://clinicaltrials.gov/ * i2b2: The i2b2 corpus * SHARE: The ShARE/CLEF e-health 2013 Task 1 testing dataset All annotations are stored in stand-off tab based format in files carrying the names corresponding to the files listed in the original corpus. In the case of the Pubmed and CT_Phentoype corpora, the file names represent Pubmed or Clinical Trials IDs, which can be directly retrived from their original publishers. The stand-off annotation format is: startOffset::endOffset [tab] original text span [tab] list of CUIs separated by comma. Silver standard corpora created from the system annotations ommit the original text span and list only the offsets and the CUIs.
提供机构:
figshare
创建时间:
2016-01-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作