five

cognitivecomputations/dolphin-distill

收藏
Hugging Face2025-06-16 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/cognitivecomputations/dolphin-distill
下载链接
链接失效反馈
官方服务:
资源简介:
Dolphin Distill 数据集是一个高质量指令遵循和推理数据集的混合体,旨在用于训练和微调语言模型。数据集包括20个来源,总共有11,598,465个样本。数据集涵盖了推理和数学问题解决、软件工程和代码、研究和问题解决、多语言和多样化指令、评估和基准测试以及高级推理等多个领域。数据集的平均样本长度为566.28个标记,标记总数为6,606,952,787个,估计大小为6.15 GB。

The Dolphin Distill Dataset is a curated mixture of high-quality instruction-following and reasoning datasets, designed for training and fine-tuning language models. It includes 11,598,465 samples from 20 different source datasets. The dataset covers a variety of fields, such as reasoning and mathematical problem-solving, software engineering and code, research and problem-solving, multilingual and diverse instructions, evaluation and benchmarks, and advanced reasoning. The average sample length is 566.28 tokens, with a total of 6,606,952,787 tokens and an estimated size of 6.15 GB.
提供机构:
cognitivecomputations
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作