ibndias/DeepSeek-Distilled-40M

Name: ibndias/DeepSeek-Distilled-40M
Creator: ibndias
Published: 2025-06-08 15:20:44
License: 暂无描述

Hugging Face2025-06-08 更新2025-10-25 收录

下载链接：

https://hf-mirror.com/datasets/ibndias/DeepSeek-Distilled-40M

下载链接

链接失效反馈

官方服务：

资源简介：

AM-DeepSeek-Distilled-40M是一个大规模、难度分级的推理数据集，包含约3.34百万个独特的查询和4000万个模型生成的摘要响应。数据集覆盖了代码、数学、科学、指令遵循和其他一般推理任务等五个主要类别。每个查询都与来自三个不同大小的模型（DeepSeek-R1-Distill-Qwen-1.5B、DeepSeek-R1-Distill-Qwen-7B和DeepSeek-R1）的摘要响应配对。对于每个查询，每个模型生成了四个采样的响应，形成了上述的综合数据集。难度评级基于这些不同大小模型的相对成功率差异提供，从而显著减少了从单个模型中得出的难度分级中的偏差。

AM-DeepSeek-Distilled-40M is a large-scale, unbiased difficulty-graded reasoning dataset, containing approximately 3.34 million unique queries and 40 million model-generated responses. The dataset covers five major categories: code, math, science, instruction-following, and other general reasoning tasks. Each query is paired with responses distilled from three different-sized models (DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-7B, and DeepSeek-R1). For each query, each model generated four sampled responses, resulting in the comprehensive dataset mentioned above. Difficulty ratings are provided based on comparative success rates across these differently-sized models, significantly reducing bias inherent in difficulty grading derived from a single model.

提供机构：

ibndias

5,000+

优质数据集

54 个

任务类型

进入经典数据集