cognitivecomputations/dolphin-distill

Name: cognitivecomputations/dolphin-distill
Creator: cognitivecomputations
Published: 2025-06-16 00:15:59
License: 暂无描述

Hugging Face2025-06-16 更新2025-07-05 收录

下载链接：

https://hf-mirror.com/datasets/cognitivecomputations/dolphin-distill

下载链接

链接失效反馈

官方服务：

资源简介：

Dolphin Distill 数据集是一个高质量指令遵循和推理数据集的混合体，旨在用于训练和微调语言模型。数据集包括20个来源，总共有11,598,465个样本。数据集涵盖了推理和数学问题解决、软件工程和代码、研究和问题解决、多语言和多样化指令、评估和基准测试以及高级推理等多个领域。数据集的平均样本长度为566.28个标记，标记总数为6,606,952,787个，估计大小为6.15 GB。

The Dolphin Distill Dataset is a curated mixture of high-quality instruction-following and reasoning datasets, designed for training and fine-tuning language models. It includes 11,598,465 samples from 20 different source datasets. The dataset covers a variety of fields, such as reasoning and mathematical problem-solving, software engineering and code, research and problem-solving, multilingual and diverse instructions, evaluation and benchmarks, and advanced reasoning. The average sample length is 566.28 tokens, with a total of 6,606,952,787 tokens and an estimated size of 6.15 GB.

提供机构：

cognitivecomputations

5,000+

优质数据集

54 个

任务类型

进入经典数据集