a-m-team/AM-DeepSeek-R1-Distilled-1.4M

Name: a-m-team/AM-DeepSeek-R1-Distilled-1.4M
Creator: a-m-team
Published: 2025-03-30 01:30:08
License: 暂无描述

Hugging Face2025-03-30 更新2025-04-12 收录

下载链接：

https://hf-mirror.com/datasets/a-m-team/AM-DeepSeek-R1-Distilled-1.4M

下载链接

链接失效反馈

官方服务：

资源简介：

AM-DeepSeek-R1-Distilled-1.4M是一个大规模的通用推理任务数据集，包含高质量和具有挑战性的推理问题。这些问题从多个开源数据集中收集而来，经过语义去重和清洗，以消除测试集的污染。数据集中的所有回答都经过推理模型的严格验证，数学问题通过答案检查，代码问题通过测试用例，其他任务通过奖励模型评估。数据集分为几个配置，包括从其他开源数据集蒸馏得到的am_0.5M和从DeepSeek-R1-671B蒸馏得到的am_0.9M，还有一个am_0.9M的1k随机样本子集。

AM-DeepSeek-R1-Distilled-1.4M is a large-scale general reasoning task dataset composed of high-quality and challenging reasoning problems. These problems are collected from numerous open-source datasets, semantically deduplicated, and cleaned to eliminate test set contamination. All responses in the dataset are distilled from the reasoning model (mostly DeepSeek-R1) and have undergone rigorous verification: mathematical problems are validated through answer checking, code problems via test cases, and other tasks through reward model evaluation. The dataset is divided into several configurations, including am_0.5M distilled from other open-source datasets and am_0.9M distilled from DeepSeek-R1-671B, as well as a 1k random sample subset of am_0.9M.

提供机构：

a-m-team

5,000+

优质数据集

54 个

任务类型

进入经典数据集