five

HoangHa/100BT-dLLM-pretokenized

收藏
Hugging Face2026-03-09 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/HoangHa/100BT-dLLM-pretokenized
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: input_ids list: uint16 splits: - name: train_0000 num_bytes: 24572802872 num_examples: 5000000 - name: train_0001 num_bytes: 24427819772 num_examples: 5000000 - name: train_0002 num_bytes: 24623903676 num_examples: 5000000 - name: train_0003 num_bytes: 24616010070 num_examples: 5000000 - name: train_0004 num_bytes: 24637598452 num_examples: 5000000 - name: train_0005 num_bytes: 24663064108 num_examples: 5000000 - name: train_0006 num_bytes: 24552607948 num_examples: 5000000 - name: train_0007 num_bytes: 24632671672 num_examples: 5000000 - name: train_0008 num_bytes: 24620108654 num_examples: 5000000 - name: train_0009 num_bytes: 24602190610 num_examples: 5000000 - name: train_0010 num_bytes: 24571793730 num_examples: 5000000 - name: train_0011 num_bytes: 24678758758 num_examples: 5000000 - name: train_0012 num_bytes: 10354772310 num_examples: 2119279 download_size: 611004341836 dataset_size: 305554102632 configs: - config_name: default data_files: - split: train_0000 path: data/train_0000-* - split: train_0001 path: data/train_0001-* - split: train_0002 path: data/train_0002-* - split: train_0003 path: data/train_0003-* - split: train_0004 path: data/train_0004-* - split: train_0005 path: data/train_0005-* - split: train_0006 path: data/train_0006-* - split: train_0007 path: data/train_0007-* - split: train_0008 path: data/train_0008-* - split: train_0009 path: data/train_0009-* - split: train_0010 path: data/train_0010-* - split: train_0011 path: data/train_0011-* - split: train_0012 path: data/train_0012-* ---
提供机构:
HoangHa
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作