Elsevier OA CC-BY Corpus
收藏arXiv2020-09-15 更新2024-06-21 收录
下载链接:
https://data.mendeley.com/datasets/zm33cdndxs/3
下载链接
链接失效反馈官方服务:
资源简介:
Elsevier OA CC-BY Corpus是由爱思唯尔有限公司创建的一个包含40,091篇开放获取科学研究论文的大型跨学科数据集。该数据集不仅包括论文的全文,还包含文档的元数据和每篇参考文献的文献信息。数据集通过分层抽样方法,确保了各学术领域的均衡代表性,支持自然语言处理和机器学习研究。该数据集旨在解决跨学科研究中数据集的有限可用性问题,促进科学文本处理技术的进步和应用。
Elsevier OA CC-BY Corpus is a large interdisciplinary dataset created by Elsevier Limited, containing 40,091 open access scientific research papers. In addition to the full text of the papers, the dataset also includes document metadata and bibliographic information for each reference. Adopting stratified sampling, the dataset ensures balanced representation across all academic disciplines, supporting natural language processing and machine learning research. This dataset aims to address the limited availability of datasets in interdisciplinary research, and promote the advancement and application of scientific text processing technologies.
提供机构:
爱思唯尔有限公司
创建时间:
2020-08-03



