five

a-m-team/AM-DeepSeek-R1-Distilled-1.4M

收藏
Hugging Face2025-03-30 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/a-m-team/AM-DeepSeek-R1-Distilled-1.4M
下载链接
链接失效反馈
官方服务:
资源简介:
AM-DeepSeek-R1-Distilled-1.4M是一个大规模的通用推理任务数据集,包含高质量和具有挑战性的推理问题。这些问题从多个开源数据集中收集而来,经过语义去重和清洗,以消除测试集的污染。数据集中的所有回答都经过推理模型的严格验证,数学问题通过答案检查,代码问题通过测试用例,其他任务通过奖励模型评估。数据集分为几个配置,包括从其他开源数据集蒸馏得到的am_0.5M和从DeepSeek-R1-671B蒸馏得到的am_0.9M,还有一个am_0.9M的1k随机样本子集。

AM-DeepSeek-R1-Distilled-1.4M is a large-scale general reasoning task dataset composed of high-quality and challenging reasoning problems. These problems are collected from numerous open-source datasets, semantically deduplicated, and cleaned to eliminate test set contamination. All responses in the dataset are distilled from the reasoning model (mostly DeepSeek-R1) and have undergone rigorous verification: mathematical problems are validated through answer checking, code problems via test cases, and other tasks through reward model evaluation. The dataset is divided into several configurations, including am_0.5M distilled from other open-source datasets and am_0.9M distilled from DeepSeek-R1-671B, as well as a 1k random sample subset of am_0.9M.
提供机构:
a-m-team
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作