five

SoMeSci

收藏
arXiv2025-09-30 收录
下载链接:
https://data.gesis.org/somesci/
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集名为SoMeSci,包含了39,768个句子和3,756个软件提及实例,这些数据被划分为训练集和私有测试集,旨在用于学术出版物中的软件提及检测。该数据集涵盖了六类实体:应用程序、操作系统、插件、编程环境以及软件会议,每类实体又分为四种类型:创建、存储、提及和使用。规模上,数据集包含了39,768个句子和3,756个软件提及,任务目标是识别学术出版物中的软件提及。

The dataset named SoMeSci includes 39,768 sentences and 3,756 software mentions, which are split into a training set and a private test set for the task of software mention detection in academic publications. It covers six categories of entities: applications, operating systems, plug-ins, programming environments, and software conferences, with each category further divided into four subtypes: creation, storage, mention, and usage. The core task of this dataset is to identify software mentions within academic publications.
提供机构:
Organizing Committee of the Software Mention Detection shared-task
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作