SoMeSci
收藏arXiv2025-09-30 收录
下载链接:
https://data.gesis.org/somesci/
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为SoMeSci,包含了39,768个句子和3,756个软件提及实例,这些数据被划分为训练集和私有测试集,旨在用于学术出版物中的软件提及检测。该数据集涵盖了六类实体:应用程序、操作系统、插件、编程环境以及软件会议,每类实体又分为四种类型:创建、存储、提及和使用。规模上,数据集包含了39,768个句子和3,756个软件提及,任务目标是识别学术出版物中的软件提及。
The dataset named SoMeSci includes 39,768 sentences and 3,756 software mentions, which are split into a training set and a private test set for the task of software mention detection in academic publications. It covers six categories of entities: applications, operating systems, plug-ins, programming environments, and software conferences, with each category further divided into four subtypes: creation, storage, mention, and usage. The core task of this dataset is to identify software mentions within academic publications.
提供机构:
Organizing Committee of the Software Mention Detection shared-task



