five

ScienceExamCER

收藏
arXiv2019-11-24 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/1911.10436v1
下载链接
链接失效反馈
官方服务:
资源简介:
ScienceExamCER是一个针对科学考试领域的高密度细粒度实体识别语料库,包含13.3万个实体提及。该数据集几乎所有内容词(96%)都被标注了细粒度的语义类别标签,如分类群组、部分-整体群组、动词/动作群组、属性和值、同义词等。这些标签来源于一个手动构建的包含601个类别的细粒度分类体系,该体系通过对4239个科学考试问题的数据驱动分析生成。数据集旨在支持科学领域中需要密集语义分类的问答下游任务。

ScienceExamCER is a high-density fine-grained entity recognition corpus tailored for the scientific examination domain, containing 133,000 entity mentions. Nearly 96% of its content words are annotated with fine-grained semantic category labels, including taxonomic groups, part-whole groups, verb/action groups, attributes and values, synonyms, and others. These labels are derived from a manually constructed fine-grained classification system consisting of 601 categories, which was generated through data-driven analysis of 4,239 scientific examination questions. This dataset is designed to support downstream question answering tasks that require dense semantic classification in the scientific domain.
提供机构:
亚利桑那大学信息学院
创建时间:
2019-11-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作