KaiNylund/arxiv-year-splits
收藏Hugging Face2023-12-04 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/KaiNylund/arxiv-year-splits
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: text
dtype: string
splits:
- name: 2006_2008_train
num_bytes: 100484371
num_examples: 120937
- name: 2006_2008_test
num_bytes: 10050474
num_examples: 12157
- name: 2009_2011_train
num_bytes: 145839572
num_examples: 157401
- name: 2009_2011_test
num_bytes: 15067693
num_examples: 16306
- name: 2012_2014_train
num_bytes: 149239610
num_examples: 153162
- name: 2012_2014_test
num_bytes: 15064105
num_examples: 15440
- name: 2015_2017_train
num_bytes: 150547411
num_examples: 136762
- name: 2015_2017_test
num_bytes: 15057851
num_examples: 13745
- name: 2018_2020_train
num_bytes: 150517629
num_examples: 129279
- name: 2018_2020_test
num_bytes: 15052957
num_examples: 12885
download_size: 474674602
dataset_size: 766921673
---
# Dataset Card for "arxiv-year-splits"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
KaiNylund
原始信息汇总
数据集概述
数据特征
- 名称: text
- 数据类型: string
数据分割
- 2006_2008_train
- 字节数: 100484371
- 样本数: 120937
- 2006_2008_test
- 字节数: 10050474
- 样本数: 12157
- 2009_2011_train
- 字节数: 145839572
- 样本数: 157401
- 2009_2011_test
- 字节数: 15067693
- 样本数: 16306
- 2012_2014_train
- 字节数: 149239610
- 样本数: 153162
- 2012_2014_test
- 字节数: 15064105
- 样本数: 15440
- 2015_2017_train
- 字节数: 150547411
- 样本数: 136762
- 2015_2017_test
- 字节数: 15057851
- 样本数: 13745
- 2018_2020_train
- 字节数: 150517629
- 样本数: 129279
- 2018_2020_test
- 字节数: 15052957
- 样本数: 12885
数据集大小
- 下载大小: 474674602 字节
- 数据集大小: 766921673 字节



