toksuitebackup/Qwen-Qwen3-8B-toksuite-detokenized
收藏Hugging Face2025-12-17 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/toksuitebackup/Qwen-Qwen3-8B-toksuite-detokenized
下载链接
链接失效反馈官方服务:
资源简介:
该数据集用于模型训练,数据已按照模型所见的确切顺序进行去标记化处理。数据集被划分为8个块(chunk-0至chunk-7),每个块对应生成数据的GPU等级。每个块包含JSON Lines格式(.jsonl)的去标记化文本文件。数据集支持多种语言,包括英语、土耳其语、波斯语、中文和意大利语。
Training data of the model detokenized in the exact order seen by the model. The training data is partitioned into 8 chunks (chunk-0 through chunk-7), based on the GPU rank that generated the data. Each chunk contains detokenized text files in JSON Lines format (.jsonl).
提供机构:
toksuitebackup



