orionweller/dolma_20bn_instruct_upsample
收藏Hugging Face2024-06-13 更新2024-06-29 收录
下载链接:
https://hf-mirror.com/datasets/orionweller/dolma_20bn_instruct_upsample
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: id
dtype: string
- name: text
dtype: string
- name: added
dtype: string
- name: created
dtype: string
- name: source
dtype: string
- name: original_shard_dir
dtype: string
- name: original_shard_idx
dtype: int64
- name: num_tokens
dtype: int64
splits:
- name: shard_0
num_bytes: 10010006336
num_examples: 2997700
- name: shard_1
num_bytes: 10041512099
num_examples: 2763408
- name: shard_2
num_bytes: 10015810172
num_examples: 2719516
- name: shard_3
num_bytes: 10039547720
num_examples: 2911206
- name: shard_4
num_bytes: 10000995298
num_examples: 3561063
- name: shard_5
num_bytes: 10008788050
num_examples: 3382942
- name: shard_6
num_bytes: 10009654657
num_examples: 3187332
- name: shard_7
num_bytes: 10055922854
num_examples: 3175214
- name: shard_8
num_bytes: 10120258157
num_examples: 4621161
- name: shard_9
num_bytes: 10456646323
num_examples: 5064727
- name: shard_10
num_bytes: 10238955984
num_examples: 10307824
- name: shard_11
num_bytes: 10130423031
num_examples: 3942796
- name: shard_12
num_bytes: 6457714538
num_examples: 5730807
download_size: 63642442616
dataset_size: 127586235219
configs:
- config_name: default
data_files:
- split: shard_0
path: data/shard_0-*
- split: shard_1
path: data/shard_1-*
- split: shard_2
path: data/shard_2-*
- split: shard_3
path: data/shard_3-*
- split: shard_4
path: data/shard_4-*
- split: shard_5
path: data/shard_5-*
- split: shard_6
path: data/shard_6-*
- split: shard_7
path: data/shard_7-*
- split: shard_8
path: data/shard_8-*
- split: shard_9
path: data/shard_9-*
- split: shard_10
path: data/shard_10-*
- split: shard_11
path: data/shard_11-*
- split: shard_12
path: data/shard_12-*
---
数据集信息:
特征:
- 名称:id,数据类型:字符串
- 名称:text,数据类型:字符串
- 名称:added,数据类型:字符串
- 名称:created,数据类型:字符串
- 名称:source,数据类型:字符串
- 名称:original_shard_dir,数据类型:字符串
- 名称:original_shard_idx,数据类型:int64
- 名称:num_tokens,数据类型:int64
拆分:
- 名称:shard_0,字节数:10010006336,样本数:2997700
- 名称:shard_1,字节数:10041512099,样本数:2763408
- 名称:shard_2,字节数:10015810172,样本数:2719516
- 名称:shard_3,字节数:10039547720,样本数:2911206
- 名称:shard_4,字节数:10000995298,样本数:3561063
- 名称:shard_5,字节数:10008788050,样本数:3382942
- 名称:shard_6,字节数:10009654657,样本数:3187332
- 名称:shard_7,字节数:10055922854,样本数:3175214
- 名称:shard_8,字节数:10120258157,样本数:4621161
- 名称:shard_9,字节数:10456646323,样本数:5064727
- 名称:shard_10,字节数:10238955984,样本数:10307824
- 名称:shard_11,字节数:10130423031,样本数:3942796
- 名称:shard_12,字节数:6457714538,样本数:5730807
下载大小:63642442616字节
数据集大小:127586235219字节
配置:
- 配置名称:default,数据文件:
- 拆分:shard_0,路径:data/shard_0-*
- 拆分:shard_1,路径:data/shard_1-*
- 拆分:shard_2,路径:data/shard_2-*
- 拆分:shard_3,路径:data/shard_3-*
- 拆分:shard_4,路径:data/shard_4-*
- 拆分:shard_5,路径:data/shard_5-*
- 拆分:shard_6,路径:data/shard_6-*
- 拆分:shard_7,路径:data/shard_7-*
- 拆分:shard_8,路径:data/shard_8-*
- 拆分:shard_9,路径:data/shard_9-*
- 拆分:shard_10,路径:data/shard_10-*
- 拆分:shard_11,路径:data/shard_11-*
- 拆分:shard_12,路径:data/shard_12-*
提供机构:
orionweller
原始信息汇总
数据集概述
数据集特征
- id: 字符串类型
- text: 字符串类型
- added: 字符串类型
- created: 字符串类型
- source: 字符串类型
- original_shard_dir: 字符串类型
- original_shard_idx: 64位整数类型
- num_tokens: 64位整数类型
数据集分割
- shard_0: 10010006336字节, 2997700个样本
- shard_1: 10041512099字节, 2763408个样本
- shard_2: 10015810172字节, 2719516个样本
- shard_3: 10039547720字节, 2911206个样本
- shard_4: 10000995298字节, 3561063个样本
- shard_5: 10008788050字节, 3382942个样本
- shard_6: 10009654657字节, 3187332个样本
- shard_7: 10055922854字节, 3175214个样本
- shard_8: 10120258157字节, 4621161个样本
- shard_9: 10456646323字节, 5064727个样本
- shard_10: 10238955984字节, 10307824个样本
- shard_11: 10130423031字节, 3942796个样本
- shard_12: 6457714538字节, 5730807个样本
数据集大小
- 下载大小: 63642442616字节
- 数据集大小: 127586235219字节
配置
- config_name: default
- data_files:
- shard_0: data/shard_0-*
- shard_1: data/shard_1-*
- shard_2: data/shard_2-*
- shard_3: data/shard_3-*
- shard_4: data/shard_4-*
- shard_5: data/shard_5-*
- shard_6: data/shard_6-*
- shard_7: data/shard_7-*
- shard_8: data/shard_8-*
- shard_9: data/shard_9-*
- shard_10: data/shard_10-*
- shard_11: data/shard_11-*
- shard_12: data/shard_12-*
- data_files:



