five

RAMBO 800+: A Corpus for the Development of Gene/Protein Recognition from Rare and Ambiguous Abbreviations

收藏
DataCite Commons2020-07-26 更新2025-04-16 收录
下载链接:
https://pub.uni-bielefeld.de/record/2673424
下载链接
链接失效反馈
官方服务:
资源简介:
We release the RAMBO 800+ corpus providing manual annotations for Rare and AMBiguOus abbreviations of gene names in about 800 MEDLINE abstracts. It can be used to train gene recognition systems for this class of abbreviations, as discussed in Hartung et al. (BioNLP 2014). The corpus covers eight gene name abbreviation types: AHR, CLI, CLU, COPD, HF, MOX, PLS, SAH. For each of these types, 100 (in case of MOX: 81) abstracts have been randomly sampled from MEDLINE. In each of these abstracts, every mention of an abbreviation of interest has been manually annotated as denoting a gene/protein or not. Plus, all other tokens in the 800 abstracts have been annotated in the same way.
提供机构:
Bielefeld University
创建时间:
2014-04-25
二维码
社区交流群
二维码
科研交流群
商业服务