Ba2han/mixed-tokenized_2811
收藏Hugging Face2025-11-29 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Ba2han/mixed-tokenized_2811
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: input_ids
list: int32
splits:
- name: train_chunk_1
num_bytes: 3274400000
num_examples: 200000
- name: train_chunk_2
num_bytes: 3274400000
num_examples: 200000
- name: train_chunk_3
num_bytes: 3274400000
num_examples: 200000
- name: train_chunk_4
num_bytes: 3274400000
num_examples: 200000
- name: train_chunk_5
num_bytes: 3274400000
num_examples: 200000
- name: train_chunk_6
num_bytes: 3274400000
num_examples: 200000
- name: train_chunk_7
num_bytes: 3274400000
num_examples: 200000
- name: train_chunk_8
num_bytes: 3274400000
num_examples: 200000
- name: train_chunk_9
num_bytes: 3274400000
num_examples: 200000
- name: train_chunk_10
num_bytes: 3274400000
num_examples: 200000
- name: train_chunk_11
num_bytes: 3274400000
num_examples: 200000
- name: train_chunk_12
num_bytes: 3274400000
num_examples: 200000
- name: train_chunk_13
num_bytes: 3274400000
num_examples: 200000
- name: train_chunk_14
num_bytes: 3274400000
num_examples: 200000
- name: train_chunk_15
num_bytes: 3274400000
num_examples: 200000
- name: train_chunk_16
num_bytes: 3274400000
num_examples: 200000
- name: train_chunk_17
num_bytes: 3274400000
num_examples: 200000
- name: train_chunk_18
num_bytes: 3274400000
num_examples: 200000
- name: train_chunk_19
num_bytes: 3274400000
num_examples: 200000
- name: train_chunk_20
num_bytes: 1884237108
num_examples: 115089
download_size: 26359338785
dataset_size: 64097837108
configs:
- config_name: default
data_files:
- split: train_chunk_1
path: data/train_chunk_1-*
- split: train_chunk_2
path: data/train_chunk_2-*
- split: train_chunk_3
path: data/train_chunk_3-*
- split: train_chunk_4
path: data/train_chunk_4-*
- split: train_chunk_5
path: data/train_chunk_5-*
- split: train_chunk_6
path: data/train_chunk_6-*
- split: train_chunk_7
path: data/train_chunk_7-*
- split: train_chunk_8
path: data/train_chunk_8-*
- split: train_chunk_9
path: data/train_chunk_9-*
- split: train_chunk_10
path: data/train_chunk_10-*
- split: train_chunk_11
path: data/train_chunk_11-*
- split: train_chunk_12
path: data/train_chunk_12-*
- split: train_chunk_13
path: data/train_chunk_13-*
- split: train_chunk_14
path: data/train_chunk_14-*
- split: train_chunk_15
path: data/train_chunk_15-*
- split: train_chunk_16
path: data/train_chunk_16-*
- split: train_chunk_17
path: data/train_chunk_17-*
- split: train_chunk_18
path: data/train_chunk_18-*
- split: train_chunk_19
path: data/train_chunk_19-*
- split: train_chunk_20
path: data/train_chunk_20-*
---
提供机构:
Ba2han



