Elsevier OA CC-BY Corpus

Name: Elsevier OA CC-BY Corpus
Creator: doi.org
License: 暂无描述

doi.org2025-03-23 收录

下载链接：

http://doi.org/10.17632/zm33cdndxs.3

下载链接

链接失效反馈

官方服务：

资源简介：

This is a corpus of 40k (40,001) open access (OA) CC-BY articles from across Elsevier’s journals represent the first cross-discipline research of data at this scale to support NLP and ML research. This dataset was released to support the development of ML and NLP models targeting science articles from across all research domains. While the release builds on other datasets designed for specific domains and tasks, it will allow for similar datasets to be derived or for the development of models which can be applied and tested across domains.

本数据集汇集了来自Elsevier期刊的40,001篇开放获取（CC-BY）文章，涵盖了多个学科领域。这是首次在该规模上对数据进行跨学科研究，以支持自然语言处理（NLP）和机器学习（ML）的研究。此数据集的发布旨在支持针对所有研究领域的科学文章的ML和NLP模型的开发。尽管该发布建立在为特定领域和任务设计的其他数据集之上，但它将允许衍生出类似的数据集，或开发出可在不同领域应用和测试的模型。

提供机构：

doi.org

5,000+

优质数据集

54 个

任务类型

进入经典数据集