SciAI, SciAD
收藏arXiv2020-10-28 更新2024-06-21 收录
下载链接:
https://github.com/amirveyseh/AAAI-21-SDU-shared-task-1-AI, https://github.com/amirveyseh/AAAI-21-SDU-shared-task-2-AD
下载链接
链接失效反馈官方服务:
资源简介:
SciAI和SciAD是两个专为科学领域缩略语识别与消歧设计的大型人工标注数据集。SciAI包含17,506个句子,主要用于缩略语识别,而SciAD则包含62,441个样本,用于缩略语消歧。这两个数据集均由俄勒冈大学计算机与信息科学系的研究人员创建,旨在解决现有数据集规模有限和多局限于医学领域的问题。数据集的创建过程包括从arXiv收集论文,手动标注缩略语及其全称,以及构建包含多种可能全称的缩略语字典。这些数据集的应用领域广泛,包括文本理解、问答系统和文档检索等,旨在提高处理缩略语的能力,从而提升文本处理的准确性和效率。
SciAI and SciAD are two large-scale manually annotated datasets specifically designed for abbreviation recognition and disambiguation in the scientific domain. SciAI comprises 17,506 sentences and is primarily utilized for abbreviation recognition, while SciAD contains 62,441 samples dedicated to abbreviation disambiguation. Both datasets were developed by researchers from the Department of Computer and Information Science at the University of Oregon, aiming to address the shortcomings of existing datasets, including their limited scale and over-concentration on the medical field. The creation process of these datasets involves collecting academic papers from arXiv, manually annotating abbreviations and their corresponding full forms, and constructing abbreviation dictionaries that incorporate multiple potential full forms. These datasets have broad application scenarios, covering text understanding, question answering systems, document retrieval and other fields, with the goal of enhancing the ability to handle abbreviations, thereby improving the accuracy and efficiency of text processing.
提供机构:
俄勒冈大学计算机与信息科学系
创建时间:
2020-10-28



