orionweller/dolma_20bn_no_math_code
收藏Hugging Face2024-06-12 更新2024-06-29 收录
下载链接:
https://hf-mirror.com/datasets/orionweller/dolma_20bn_no_math_code
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: id
dtype: string
- name: text
dtype: string
- name: added
dtype: string
- name: created
dtype: string
- name: source
dtype: string
- name: original_shard_dir
dtype: string
- name: original_shard_idx
dtype: int64
- name: num_tokens
dtype: int64
splits:
- name: shard_0
num_bytes: 10062383166
num_examples: 3282474
- name: shard_1
num_bytes: 10022415750
num_examples: 2684289
- name: shard_2
num_bytes: 10034893329
num_examples: 2788104
- name: shard_3
num_bytes: 10039856413
num_examples: 2674933
- name: shard_4
num_bytes: 10018215628
num_examples: 2906819
- name: shard_5
num_bytes: 10011122171
num_examples: 2913128
- name: shard_6
num_bytes: 10000445582
num_examples: 3811472
- name: shard_7
num_bytes: 10055113446
num_examples: 3352637
- name: shard_8
num_bytes: 10044175974
num_examples: 3338513
- name: shard_9
num_bytes: 10051599320
num_examples: 3183067
- name: shard_10
num_bytes: 10038099477
num_examples: 3147238
- name: shard_11
num_bytes: 10028475837
num_examples: 5393270
- name: shard_12
num_bytes: 2782314868
num_examples: 1711580
download_size: 73981900532
dataset_size: 123189110961
configs:
- config_name: default
data_files:
- split: shard_0
path: data/shard_0-*
- split: shard_1
path: data/shard_1-*
- split: shard_2
path: data/shard_2-*
- split: shard_3
path: data/shard_3-*
- split: shard_4
path: data/shard_4-*
- split: shard_5
path: data/shard_5-*
- split: shard_6
path: data/shard_6-*
- split: shard_7
path: data/shard_7-*
- split: shard_8
path: data/shard_8-*
- split: shard_9
path: data/shard_9-*
- split: shard_10
path: data/shard_10-*
- split: shard_11
path: data/shard_11-*
- split: shard_12
path: data/shard_12-*
---
提供机构:
orionweller
原始信息汇总
数据集概述
数据集特征
- id: 字符串类型
- text: 字符串类型
- added: 字符串类型
- created: 字符串类型
- source: 字符串类型
- original_shard_dir: 字符串类型
- original_shard_idx: 整数类型
- num_tokens: 整数类型
数据集分片
- shard_0:
- 字节数: 10062383166
- 样本数: 3282474
- shard_1:
- 字节数: 10022415750
- 样本数: 2684289
- shard_2:
- 字节数: 10034893329
- 样本数: 2788104
- shard_3:
- 字节数: 10039856413
- 样本数: 2674933
- shard_4:
- 字节数: 10018215628
- 样本数: 2906819
- shard_5:
- 字节数: 10011122171
- 样本数: 2913128
- shard_6:
- 字节数: 10000445582
- 样本数: 3811472
- shard_7:
- 字节数: 10055113446
- 样本数: 3352637
- shard_8:
- 字节数: 10044175974
- 样本数: 3338513
- shard_9:
- 字节数: 10051599320
- 样本数: 3183067
- shard_10:
- 字节数: 10038099477
- 样本数: 3147238
- shard_11:
- 字节数: 10028475837
- 样本数: 5393270
- shard_12:
- 字节数: 2782314868
- 样本数: 1711580
数据集大小
- 下载大小: 73981900532 字节
- 数据集大小: 123189110961 字节
配置
- config_name: default
- data_files:
- shard_0: data/shard_0-*
- shard_1: data/shard_1-*
- shard_2: data/shard_2-*
- shard_3: data/shard_3-*
- shard_4: data/shard_4-*
- shard_5: data/shard_5-*
- shard_6: data/shard_6-*
- shard_7: data/shard_7-*
- shard_8: data/shard_8-*
- shard_9: data/shard_9-*
- shard_10: data/shard_10-*
- shard_11: data/shard_11-*
- shard_12: data/shard_12-*
- data_files:



