RealTimeData/arxiv_alltime
收藏Hugging Face2025-05-28 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/RealTimeData/arxiv_alltime
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含多个配置,每个配置对应一个特定的时间段(如2017年1月至2020年11月)。每个配置的数据集包含以下特征:entry_id(条目ID)、published(发布日期)、title(标题)、authors(作者列表)、primary_category(主要类别)、categories(类别列表)和text(文本内容)。数据集仅包含训练集,每个训练集都有相应的字节数和示例数。
This dataset contains multiple configurations, each corresponding to a specific time period (e.g., from January 2017 to November 2020). Each configuration includes the following features: entry_id (entry ID), published (publication date), title (title), authors (list of authors), primary_category (primary category), categories (list of categories), and text (text content). The dataset only contains training sets, and each training set has corresponding byte sizes and number of examples.
提供机构:
RealTimeData
原始信息汇总
数据集概述
该数据集包含多个配置版本,每个版本涵盖不同月份的数据。以下是各配置版本的详细信息:
数据集配置版本
-
2017-01
- 特征:
- entry_id: string
- published: string
- title: string
- authors: sequence of string
- primary_category: string
- categories: sequence of string
- text: string
- 分割:
- train
- 字节数: 19895148
- 样本数: 482
- train
- 下载大小: 9877238
- 数据集大小: 19895148
- 特征:
-
2017-02
- 特征:
- entry_id: string
- published: string
- title: string
- authors: sequence of string
- primary_category: string
- categories: sequence of string
- text: string
- 分割:
- train
- 字节数: 20111448
- 样本数: 499
- train
- 下载大小: 9967413
- 数据集大小: 20111448
- 特征:
-
2017-03
- 特征:
- entry_id: string
- published: string
- title: string
- authors: sequence of string
- primary_category: string
- categories: sequence of string
- text: string
- 分割:
- train
- 字节数: 20815725
- 样本数: 500
- train
- 下载大小: 10425653
- 数据集大小: 20815725
- 特征:
-
2017-04
- 特征:
- entry_id: string
- published: string
- title: string
- authors: sequence of string
- primary_category: string
- categories: sequence of string
- text: string
- 分割:
- train
- 字节数: 21575576
- 样本数: 527
- train
- 下载大小: 10815992
- 数据集大小: 21575576
- 特征:
-
2017-05
- 特征:
- entry_id: string
- published: string
- title: string
- authors: sequence of string
- primary_category: string
- categories: sequence of string
- text: string
- 分割:
- train
- 字节数: 18573038
- 样本数: 473
- train
- 下载大小: 9309268
- 数据集大小: 18573038
- 特征:
-
2017-06
- 特征:
- entry_id: string
- published: string
- title: string
- authors: sequence of string
- primary_category: string
- categories: sequence of string
- text: string
- 分割:
- train
- 字节数: 22890828
- 样本数: 507
- train
- 下载大小: 11343584
- 数据集大小: 22890828
- 特征:
-
2017-07
- 特征:
- entry_id: string
- published: string
- title: string
- authors: sequence of string
- primary_category: string
- categories: sequence of string
- text: string
- 分割:
- train
- 字节数: 19960611
- 样本数: 493
- train
- 下载大小: 10152091
- 数据集大小: 19960611
- 特征:
-
2017-08
- 特征:
- entry_id: string
- published: string
- title: string
- authors: sequence of string
- primary_category: string
- categories: sequence of string
- text: string
- 分割:
- train
- 字节数: 19273098
- 样本数: 474
- train
- 下载大小: 9615408
- 数据集大小: 19273098
- 特征:
-
2017-09
- 特征:
- entry_id: string
- published: string
- title: string
- authors: sequence of string
- primary_category: string
- categories: sequence of string
- text: string
- 分割:
- train
- 字节数: 22552151
- 样本数: 532
- train
- 下载大小: 11305139
- 数据集大小: 22552151
- 特征:
-
2017-10
- 特征:
- entry_id: string
- published: string
- title: string
- authors: sequence of string
- primary_category: string
- categories: sequence of string
- text: string
- 分割:
- train
- 字节数: 21441238
- 样本数: 496
- train
- 下载大小: 10519666
- 数据集大小: 21441238
- 特征:
-
2017-11
- 特征:
- entry_id: string
- published: string
- title: string
- authors: sequence of string
- primary_category: string
- categories: sequence of string
- text: string
- 分割:
- train
- 字节数: 20655484
- 样本数: 520
- train
- 下载大小: 10411397
- 数据集大小: 20655484
- 特征:
-
2017-12
- 特征:
- entry_id: string
- published: string
- title: string
- authors: sequence of string
- primary_category: string
- categories: sequence of string
- text: string
- 分割:
- train
- 字节数: 19708202
- 样本数: 479
- train
- 下载大小: 9849435
- 数据集大小: 19708202
- 特征:
-
2018-01
- 特征:
- entry_id: string
- published: string
- title: string
- authors: sequence of string
- primary_category: string
- categories: sequence of string
- text: string
- 分割:
- train
- 字节数: 18090140
- 样本数: 488
- train
- 下载大小: 9163072
- 数据集大小: 18090140
- 特征:
-
2018-02
- 特征:
- entry_id: string
- published: string
- title: string
- authors: sequence of string
- primary_category: string
- categories: sequence of string
- text: string
- 分割:
- train
- 字节数: 25638031
- 样本数: 530
- train
- 下载大小: 12602449
- 数据集大小: 25638031
- 特征:
-
2018-03
- 特征:
- entry_id: string
- published: string
- title: string
- authors: sequence of string
- primary_category: string
- categories: sequence of string
- text: string
- 分割:
- train
- 字节数: 19922782
- 样本数: 512
- train
- 下载大小: 10043038
- 数据集大小: 19922782
- 特征:
-
2018-04
- 特征:
- entry_id: string
- published: string
- title: string
- authors: sequence of string
- primary_category: string
- categories: sequence of string
- text: string
- 分割:
- train
- 字节数: 20318335
- 样本数: 499
- train
- 下载大小: 10264944
- 数据集大小: 20318335
- 特征:
-
2018-05
- 特征:
- entry_id: string
- published: string
- title: string
- authors: sequence of string
- primary_category: string
- categories: sequence of string
- text: string
- 分割:
- train
- 字节数: 19116513
- 样本数: 493
- train
- 下载大小: 9561998
- 数据集大小: 19116513
- 特征:
-
2018-06
- 特征:
- entry_id: string
- published: string
- title: string
- authors: sequence of string
- primary_category: string
- categories: sequence of string
- text: string
- 分割:
- train
- 字节数: 21277471
- 样本数: 511
- train
- 下载大小: 10625238
- 数据集大小: 21277471
- 特征:
-
2018-07
- 特征:
- entry_id: string
- published: string
- title: string
- authors: sequence of string
- primary_category: string
- categories: sequence of string
- text: string
- 分割:
- train
- 字节数: 20322860
- 样本数: 517
- train
- 下载大小: 10250233
- 数据集大小: 20322860
- 特征:
-
2018-08
- 特征:
- entry_id: string
- published: string
- title: string
- authors: sequence of string
- primary_category: string
- categories: sequence of string
- text: string
- 分割:
- train
- 字节数: 20466912
- 样本数: 504
- train
- 下载大小: 10207103
- 数据集大小: 20466912
- 特征:
-
2018-09
- 特征:
- entry_id: string
- published: string
- title: string
- authors: sequence of string
- primary_category: string
- categories: sequence of string
- text: string
- 分割:
- train
- 字节数: 21521957
- 样本数: 516
- train
- 下载大小: 10292535
- 数据集大小: 21521957
- 特征:
-
2018-10
- 特征:
- entry_id: string
- published: string
- title: string
- authors: sequence of string
- primary_category: string
- categories: sequence of string
- text: string
- 分割:
- train
- 字节数: 22892365
- 样本数: 532
- train
- 下载大小: 11360268
- 数据集大小: 22892365
- 特征:
-
2018-11
- 特征:
- entry_id: string
- published: string
- title: string
- authors: sequence of string
- primary_category: string
- categories: sequence of string
- text: string
- 分割:
- train
- 字节数: 22750886
- 样本数: 531
- train
- 下载大小: 11400549
- 数据集大小: 22750886
- 特征:
-
2018-12
- 特征:
- entry_id: string
- published: string
- title: string
- authors: sequence of string
- primary_category: string
- categories: sequence of string
- text: string
- 分割:
- train
- 字节数: 19157411
- 样本数: 475
- train
- 下载大小: 9548624
- 数据集大小: 19157411
- 特征:
-
2019-01
- 特征:
- entry_id: string
- published: string
- title: string
- authors: sequence of string
- primary_category: string
- categories: sequence of string
- text: string
- 分割:
- train
- 字节数: 21024786
- 样本数: 498
- train
- 下载大小: 10499015
- 数据集大小: 21024786
- 特征:
-
2019-02
- 特征:
- entry_id: string
- published: string
- title: string
- authors: sequence of string
- primary_category: string
- categories: sequence of string
- text: string
- 分割:
- train
- 字节数: 21517028
- 样本数: 506
- train
- 下载大小: 10736779
- 数据集大小: 21517028
- 特征:
-
2019-03
- 特征:
- entry_id: string
- published: string
- title: string
- authors: sequence of string
- primary_category: string
- categories: sequence of string
- text: string
- 分割:
- train
- 字节数: 21397298
- 样本数: 500
- train
- 下载大小: 10804690
- 数据集大小: 21397298
- 特征:
-
2019-04
- 特征:
- entry_id: string
- published: string
- title: string
- authors: sequence of string
- primary_category: string
- categories: sequence of string
- text: string
- 分割:
- train
- 字节数: 23049654
- 样本数: 535
- train
- 下载大小: 11329714
- 数据集大小: 23049654
- 特征:
-
2019-05
- 特征:
- entry_id: string
- published: string
- title: string
- authors: sequence of string
- primary_category: string
- categories: sequence of string
- text: string
- 分割:
- train
- 字节数: 21896838
- 样本数: 522
- train
- 下载大小: 10901776
- 数据集大小: 21896838
- 特征:
-
2019-06
- 特征:
- entry_id: string
- published
- 特征:



