SYNTHETIC-2-RL
收藏魔搭社区2025-12-05 更新2025-07-12 收录
下载链接:
https://modelscope.cn/datasets/PrimeIntellect/SYNTHETIC-2-RL
下载链接
链接失效反馈官方服务:
资源简介:
# SYNTHETIC-2
SYNTHETIC-2 is an open reasoning dataset spanning a variety of math, coding and general reasoning tasks along with reasoning traces generated in a collaborative manner. The dataset contains both high quality reasoning traces from Deepseek-R1-0528 ideally suited for SFT, as well as multiple reasoning traces from smaller models which can be used for difficulty estimation.
To read more about our data collection approach, check out our [blog post](https://www.primeintellect.ai/blog/synthetic-2-release).

We release the following final dataset splits on Huggingface:
- [SYNTHETIC-2](https://huggingface.co/datasets/PrimeIntellect/SYNTHETIC-2): The full SYNTHETIC-2 dataset consisting of all prompts and completions along with rewards
- [SYNTHETIC-2-SFT-verified](https://huggingface.co/datasets/PrimeIntellect/SYNTHETIC-2-SFT-verified): The SFT split of SYNTHETIC-2 with responses from Deepseek-R1-0528 verified as correct (rewards of 1 for binary rewards and over 0.7 for non-binary rewards)
- [SYNTHETIC-2-SFT-unverified](https://huggingface.co/datasets/PrimeIntellect/SYNTHETIC-2-SFT-unverified): The SFT split of SYNTHETIC-2 with all responses, including those not verified as correct
- [SYNTHETIC-2-RL](https://huggingface.co/datasets/PrimeIntellect/SYNTHETIC-2-RL): The RL subset of SYNTHETIC-2 with difficulty annotations from Qwen3-32B, Qwen3-4B and DeepSeek-R1-0528-Qwen3-8B
# SYNTHETIC-2
SYNTHETIC-2是一款开放推理数据集,涵盖数学、编程及通用推理等多类任务,同时包含以协作方式生成的推理轨迹。该数据集既包含来自Deepseek-R1-0528的高质量推理轨迹,非常适合监督微调(Supervised Fine-Tuning,SFT),也收录了来自更小规模模型的多条推理轨迹,可用于任务难度评估。
若需了解更多数据收集方法的相关细节,请查阅我们的[博客文章](https://www.primeintellect.ai/blog/synthetic-2-release)。

我们在Huggingface平台发布了如下最终数据集划分:
- [SYNTHETIC-2](https://huggingface.co/datasets/PrimeIntellect/SYNTHETIC-2):完整SYNTHETIC-2数据集,包含全部提示词、补全结果与奖励标签
- [SYNTHETIC-2-SFT-verified](https://huggingface.co/datasets/PrimeIntellect/SYNTHETIC-2-SFT-verified):SYNTHETIC-2的SFT划分子集,其中Deepseek-R1-0528生成的回复已通过正确性验证(二元奖励标签取值为1,非二元奖励标签取值大于0.7)
- [SYNTHETIC-2-SFT-unverified](https://huggingface.co/datasets/PrimeIntellect/SYNTHETIC-2-SFT-unverified):SYNTHETIC-2的SFT划分子集,收录所有回复,包含未通过正确性验证的内容
- [SYNTHETIC-2-RL](https://huggingface.co/datasets/PrimeIntellect/SYNTHETIC-2-RL):SYNTHETIC-2的强化学习(Reinforcement Learning,RL)子集,包含来自Qwen3-32B、Qwen3-4B及DeepSeek-R1-0528-Qwen3-8B的难度标注
提供机构:
maas
创建时间:
2025-07-11



