cognitivecomputations/dolphin-distill
收藏Hugging Face2025-06-16 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/cognitivecomputations/dolphin-distill
下载链接
链接失效反馈官方服务:
资源简介:
Dolphin Distill 数据集是一个高质量指令遵循和推理数据集的混合体,旨在用于训练和微调语言模型。数据集包括20个来源,总共有11,598,465个样本。数据集涵盖了推理和数学问题解决、软件工程和代码、研究和问题解决、多语言和多样化指令、评估和基准测试以及高级推理等多个领域。数据集的平均样本长度为566.28个标记,标记总数为6,606,952,787个,估计大小为6.15 GB。
The Dolphin Distill Dataset is a curated mixture of high-quality instruction-following and reasoning datasets, designed for training and fine-tuning language models. It includes 11,598,465 samples from 20 different source datasets. The dataset covers a variety of fields, such as reasoning and mathematical problem-solving, software engineering and code, research and problem-solving, multilingual and diverse instructions, evaluation and benchmarks, and advanced reasoning. The average sample length is 566.28 tokens, with a total of 6,606,952,787 tokens and an estimated size of 6.15 GB.
提供机构:
cognitivecomputations



