orionweller/dolma_20bn_wiki_upsample
收藏Hugging Face2024-06-12 更新2024-06-29 收录
下载链接:
https://hf-mirror.com/datasets/orionweller/dolma_20bn_wiki_upsample
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: id
dtype: string
- name: text
dtype: string
- name: added
dtype: string
- name: created
dtype: string
- name: source
dtype: string
- name: original_shard_dir
dtype: string
- name: original_shard_idx
dtype: int64
- name: num_tokens
dtype: int64
splits:
- name: shard_0
num_bytes: 10048343063
num_examples: 3082936
- name: shard_1
num_bytes: 10025703829
num_examples: 2736677
- name: shard_2
num_bytes: 10015117262
num_examples: 2722726
- name: shard_3
num_bytes: 10002162828
num_examples: 2850395
- name: shard_4
num_bytes: 10048812357
num_examples: 2893974
- name: shard_5
num_bytes: 10016959439
num_examples: 3759486
- name: shard_6
num_bytes: 10043574169
num_examples: 3389532
- name: shard_7
num_bytes: 10011168227
num_examples: 3183976
- name: shard_8
num_bytes: 10019125382
num_examples: 3147012
- name: shard_9
num_bytes: 10043973897
num_examples: 4916390
- name: shard_10
num_bytes: 10136633345
num_examples: 2857695
- name: shard_11
num_bytes: 11034916419
num_examples: 3568971
- name: shard_12
num_bytes: 5259699689
num_examples: 2676658
download_size: 73281475328
dataset_size: 126706189906
configs:
- config_name: default
data_files:
- split: shard_0
path: data/shard_0-*
- split: shard_1
path: data/shard_1-*
- split: shard_2
path: data/shard_2-*
- split: shard_3
path: data/shard_3-*
- split: shard_4
path: data/shard_4-*
- split: shard_5
path: data/shard_5-*
- split: shard_6
path: data/shard_6-*
- split: shard_7
path: data/shard_7-*
- split: shard_8
path: data/shard_8-*
- split: shard_9
path: data/shard_9-*
- split: shard_10
path: data/shard_10-*
- split: shard_11
path: data/shard_11-*
- split: shard_12
path: data/shard_12-*
---
提供机构:
orionweller
原始信息汇总
数据集概述
数据集特征
- id: 字符串类型
- text: 字符串类型
- added: 字符串类型
- created: 字符串类型
- source: 字符串类型
- original_shard_dir: 字符串类型
- original_shard_idx: 64位整数类型
- num_tokens: 64位整数类型
数据集分片信息
- shard_0:
- 字节数: 10048343063
- 样本数: 3082936
- shard_1:
- 字节数: 10025703829
- 样本数: 2736677
- shard_2:
- 字节数: 10015117262
- 样本数: 2722726
- shard_3:
- 字节数: 10002162828
- 样本数: 2850395
- shard_4:
- 字节数: 10048812357
- 样本数: 2893974
- shard_5:
- 字节数: 10016959439
- 样本数: 3759486
- shard_6:
- 字节数: 10043574169
- 样本数: 3389532
- shard_7:
- 字节数: 10011168227
- 样本数: 3183976
- shard_8:
- 字节数: 10019125382
- 样本数: 3147012
- shard_9:
- 字节数: 10043973897
- 样本数: 4916390
- shard_10:
- 字节数: 10136633345
- 样本数: 2857695
- shard_11:
- 字节数: 11034916419
- 样本数: 3568971
- shard_12:
- 字节数: 5259699689
- 样本数: 2676658
数据集大小
- 下载大小: 73281475328 字节
- 数据集大小: 126706189906 字节
配置信息
- config_name: default
- data_files:
- shard_0: data/shard_0-*
- shard_1: data/shard_1-*
- shard_2: data/shard_2-*
- shard_3: data/shard_3-*
- shard_4: data/shard_4-*
- shard_5: data/shard_5-*
- shard_6: data/shard_6-*
- shard_7: data/shard_7-*
- shard_8: data/shard_8-*
- shard_9: data/shard_9-*
- shard_10: data/shard_10-*
- shard_11: data/shard_11-*
- shard_12: data/shard_12-*
- data_files:



