timaeus/dsir-pile-13m-filtered-for-pubmed-central
收藏Hugging Face2024-11-15 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/timaeus/dsir-pile-13m-filtered-for-pubmed-central
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含三个主要特征:contents(字符串类型)、metadata(包含pile_set_name的字符串序列)和id(int64类型)。数据集分为一个训练集(train),包含293,009个样本,总大小为465,947,525.9690335字节。下载大小为300,544,310字节,数据集总大小与训练集大小相同。配置信息指定了默认配置下的数据文件路径。
The dataset contains three main features: contents (string type), metadata (containing a sequence of strings for pile_set_name), and id (int64 type). The dataset is divided into one training set (train) with 293,009 samples, totaling 465,947,525.9690335 bytes. The download size is 300,544,310 bytes, and the total dataset size is the same as the training set size. The configuration information specifies the data file path under the default configuration.
提供机构:
timaeus



