Annotated Corpus of PubMed Abstracts

Name: Annotated Corpus of PubMed Abstracts
Creator: 索菲亚大学数学与信息科学学院
Published: 2019-12-04 15:22:32
License: 暂无描述

arXiv2019-12-04 更新2024-08-06 收录

下载链接：

http://arxiv.org/abs/1912.01831v1

下载链接

链接失效反馈

官方服务：

资源简介：

本研究构建了一个包含750篇PubMed摘要的标注语料库，旨在研究治疗和物质报告的效果。该数据集由索菲亚大学数学与信息科学学院等机构创建，每篇摘要都标注了关于治疗或物质效果的正面、负面或中性描述。数据集的创建涉及自动处理和人工标注活动，特别关注医学术语和缩写的识别。该数据集主要用于训练文本分类器，以识别PubMed摘要中讨论的效果，目前分类器的准确率为78.80%。此外，该数据集还可能应用于其他相关研究领域，以提高对医学文本的理解和处理能力。

This study constructs an annotated corpus containing 750 PubMed abstracts, aiming to investigate the effects reported for treatments and substances. This dataset was developed by institutions including the School of Mathematics and Information Sciences of Sofia University and other relevant organizations. Each abstract is annotated with positive, negative, or neutral descriptions regarding the effects of treatments or substances. The creation of this dataset involved both automatic processing and manual annotation activities, with special focus on the recognition of medical terms and abbreviations. This dataset is mainly used for training text classifiers to identify the discussed effects in PubMed abstracts, and the current accuracy of the classifier reaches 78.80%. In addition, this dataset can also be applied to other related research fields to improve the understanding and processing abilities of medical texts.

提供机构：

索菲亚大学数学与信息科学学院

创建时间：

2019-12-04

5,000+

优质数据集

54 个

任务类型

进入经典数据集