SIRIS-Lab/scilake-fulltext-corpus
收藏Hugging Face2026-04-19 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/SIRIS-Lab/scilake-fulltext-corpus
下载链接
链接失效反馈官方服务:
资源简介:
SciLake全文语料库是一个由科学论文全文组成的数据集,这些论文已经被解析并按章节分段,主要设计用于自然语言处理模型开发和评估的研究。该数据集包含来自不同科学领域的1000篇全文论文,包括神经科学、癌症、交通和能源,以及来自一般科学领域的5000篇随机论文。所有论文都已获得允许合法使用的许可,特别是CC-BY和公有领域。
The SciLake Fulltext Corpus is a collection of scientific papers that have been parsed and segmented by section, primarily designed for research in the development and evaluation of NLP models. This dataset contains 1,000 full-text papers from various scientific domains, including Neuroscience, Cancer, Transport, and Energy, as well as an additional 5,000 random papers from general scientific domains. All papers have been curated with licenses that allow for legal usage, specifically CC-BY and Public Domain.
提供机构:
SIRIS-Lab



