arXiv Scientific Statement Dataset
收藏arXiv2025-09-30 收录
下载链接:
https://arxiv.org/abs/1908.10993
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是从arXiv.org衍生出的大规模监督学习数据集,包含了1050万个标注段落,这些段落被划分为50个不同的标签,用于科学陈述的分类。该数据集还包括了文本和数学符号交错的统一内联上下文,这有助于对科学论述进行复杂建模。数据集的规模达到了1050万个段落,任务是对科学陈述进行分类。
This large-scale supervised learning dataset is derived from arXiv.org. It comprises 10.5 million annotated paragraphs grouped into 50 distinct labels for scientific statement classification. The dataset also features unified inline contexts where text and mathematical symbols are interleaved, which facilitates complex modeling of scientific discourse. Boasting a total scale of 10.5 million paragraphs, the core task of this dataset is scientific statement classification.
提供机构:
arXiv.org



