HoangHa/100BT-dLLM-pretokenized
收藏Hugging Face2026-03-09 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/HoangHa/100BT-dLLM-pretokenized
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: input_ids
list: uint16
splits:
- name: train_0000
num_bytes: 24572802872
num_examples: 5000000
- name: train_0001
num_bytes: 24427819772
num_examples: 5000000
- name: train_0002
num_bytes: 24623903676
num_examples: 5000000
- name: train_0003
num_bytes: 24616010070
num_examples: 5000000
- name: train_0004
num_bytes: 24637598452
num_examples: 5000000
- name: train_0005
num_bytes: 24663064108
num_examples: 5000000
- name: train_0006
num_bytes: 24552607948
num_examples: 5000000
- name: train_0007
num_bytes: 24632671672
num_examples: 5000000
- name: train_0008
num_bytes: 24620108654
num_examples: 5000000
- name: train_0009
num_bytes: 24602190610
num_examples: 5000000
- name: train_0010
num_bytes: 24571793730
num_examples: 5000000
- name: train_0011
num_bytes: 24678758758
num_examples: 5000000
- name: train_0012
num_bytes: 10354772310
num_examples: 2119279
download_size: 611004341836
dataset_size: 305554102632
configs:
- config_name: default
data_files:
- split: train_0000
path: data/train_0000-*
- split: train_0001
path: data/train_0001-*
- split: train_0002
path: data/train_0002-*
- split: train_0003
path: data/train_0003-*
- split: train_0004
path: data/train_0004-*
- split: train_0005
path: data/train_0005-*
- split: train_0006
path: data/train_0006-*
- split: train_0007
path: data/train_0007-*
- split: train_0008
path: data/train_0008-*
- split: train_0009
path: data/train_0009-*
- split: train_0010
path: data/train_0010-*
- split: train_0011
path: data/train_0011-*
- split: train_0012
path: data/train_0012-*
---
提供机构:
HoangHa



