a-m-team/AM-DeepSeek-R1-Distilled-1.4M
收藏Hugging Face2025-03-30 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/a-m-team/AM-DeepSeek-R1-Distilled-1.4M
下载链接
链接失效反馈官方服务:
资源简介:
AM-DeepSeek-R1-Distilled-1.4M是一个大规模的通用推理任务数据集,包含高质量和具有挑战性的推理问题。这些问题从多个开源数据集中收集而来,经过语义去重和清洗,以消除测试集的污染。数据集中的所有回答都经过推理模型的严格验证,数学问题通过答案检查,代码问题通过测试用例,其他任务通过奖励模型评估。数据集分为几个配置,包括从其他开源数据集蒸馏得到的am_0.5M和从DeepSeek-R1-671B蒸馏得到的am_0.9M,还有一个am_0.9M的1k随机样本子集。
AM-DeepSeek-R1-Distilled-1.4M is a large-scale general reasoning task dataset composed of high-quality and challenging reasoning problems. These problems are collected from numerous open-source datasets, semantically deduplicated, and cleaned to eliminate test set contamination. All responses in the dataset are distilled from the reasoning model (mostly DeepSeek-R1) and have undergone rigorous verification: mathematical problems are validated through answer checking, code problems via test cases, and other tasks through reward model evaluation. The dataset is divided into several configurations, including am_0.5M distilled from other open-source datasets and am_0.9M distilled from DeepSeek-R1-671B, as well as a 1k random sample subset of am_0.9M.
提供机构:
a-m-team



