theglassofwater/pretraining_tokenized_dataset_1
收藏Hugging Face2024-05-16 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/theglassofwater/pretraining_tokenized_dataset_1
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: 'Unnamed: 0'
dtype: int64
- name: input_id
dtype: string
splits:
- name: train
num_bytes: 3283577413
num_examples: 69843
download_size: 1016289966
dataset_size: 3283577413
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
The dataset includes two features: Unnamed: 0 (dtype: int64) and input_id (dtype: string). It consists of one training split (train) with 69,843 examples and a total byte size of 3,283,577,413. The download size of the dataset is 1,016,289,966 bytes, and the actual size is 3,283,577,413 bytes. The dataset configuration is named default with the training data file path data/train-*.
提供机构:
theglassofwater
原始信息汇总
数据集概述
数据集特征
- Unnamed: 0:数据类型为
int64。 - input_id:数据类型为
string。
数据集划分
- 训练集(train):
- 示例数量:69843
- 数据大小:3283577413 字节
数据集大小
- 下载大小:1016289966 字节
- 数据集总大小:3283577413 字节
配置信息
- 默认配置(default):
- 训练数据路径:
data/train-*
- 训练数据路径:



