yulan-team/YuLan-Mini-Datasets-Phasae-26
收藏Hugging Face2025-03-20 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/yulan-team/YuLan-Mini-Datasets-Phasae-26
下载链接
链接失效反馈官方服务:
资源简介:
YuLan-Mini phase 26是一个经过tokenization处理的语言数据集,适用于问答和文本生成任务,包含中文和英文两种语言,特别标注了包含代码和数学相关的内容。数据集规模较大,介于10B到100B之间。每行数据被压缩至28K tokens,可以通过提供的Python代码加载和处理。
The YuLan-Mini phase 26 is a tokenized language dataset suitable for question-answering and text generation tasks, including both Chinese and English languages, with specific tags for code and math content. The dataset is large in scale, ranging from 10B to 100B. Each line of the dataset is compressed to 28K tokens and can be loaded and processed using the provided Python code.
提供机构:
yulan-team



