five

SYNTHETIC-2

收藏
魔搭社区2025-12-05 更新2025-07-12 收录
下载链接:
https://modelscope.cn/datasets/PrimeIntellect/SYNTHETIC-2
下载链接
链接失效反馈
官方服务:
资源简介:
# SYNTHETIC-2 SYNTHETIC-2 is an open reasoning dataset spanning a variety of math, coding and general reasoning tasks along with reasoning traces generated in a collaborative manner. The dataset contains both high quality reasoning traces from Deepseek-R1-0528 ideally suited for SFT, as well as multiple reasoning traces from smaller models which can be used for difficulty estimation. To read more about our data collection approach, check out our [blog post](https://www.primeintellect.ai/blog/synthetic-2-release). ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64a32edf17b9f57eaec2ea65/l3MK8DdljMliCgif36aM3.png) We release the following final dataset splits on Huggingface: - [SYNTHETIC-2](https://huggingface.co/datasets/PrimeIntellect/SYNTHETIC-2): The full SYNTHETIC-2 dataset consisting of all prompts and completions along with rewards - [SYNTHETIC-2-SFT-verified](https://huggingface.co/datasets/PrimeIntellect/SYNTHETIC-2-SFT-verified): The SFT split of SYNTHETIC-2 with responses from Deepseek-R1-0528 verified as correct (rewards of 1 for binary rewards and over 0.7 for non-binary rewards) - [SYNTHETIC-2-SFT-unverified](https://huggingface.co/datasets/PrimeIntellect/SYNTHETIC-2-SFT-unverified): The SFT split of SYNTHETIC-2 with all responses, including those not verified as correct - [SYNTHETIC-2-RL](https://huggingface.co/datasets/PrimeIntellect/SYNTHETIC-2-RL): The RL subset of SYNTHETIC-2 with difficulty annotations from Qwen3-32B, Qwen3-4B and DeepSeek-R1-0528-Qwen3-8B

# SYNTHETIC-2 SYNTHETIC-2 是一款开源推理数据集,涵盖各类数学、编程与通用推理任务,同时包含以协作方式生成的推理轨迹。该数据集既包含源自Deepseek-R1-0528的高质量推理轨迹,非常适合用于监督微调(Supervised Fine-Tuning, SFT),同时也包含来自多个小型模型的推理轨迹,可用于难度评估。 如需了解更多关于本数据集的采集方法,请查阅我们的[博客文章](https://www.primeintellect.ai/blog/synthetic-2-release)。 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64a32edf17b9f57eaec2ea65/l3MK8DdljMliCgif36aM3.png) 我们在Huggingface平台上发布了以下最终数据集分支: - [SYNTHETIC-2](https://huggingface.co/datasets/PrimeIntellect/SYNTHETIC-2):完整的SYNTHETIC-2数据集,包含所有提示词、补全结果以及奖励分值 - [SYNTHETIC-2-SFT-verified](https://huggingface.co/datasets/PrimeIntellect/SYNTHETIC-2-SFT-verified):SYNTHETIC-2的监督微调分支,仅包含经验证正确的Deepseek-R1-0528生成的回复(二元奖励下分值为1,非二元奖励下分值高于0.7) - [SYNTHETIC-2-SFT-unverified](https://huggingface.co/datasets/PrimeIntellect/SYNTHETIC-2-SFT-unverified):SYNTHETIC-2的监督微调分支,包含所有回复(无论是否经验证正确) - [SYNTHETIC-2-RL](https://huggingface.co/datasets/PrimeIntellect/SYNTHETIC-2-RL):SYNTHETIC-2的强化学习(Reinforcement Learning, RL)子集,包含来自Qwen3-32B、Qwen3-4B以及DeepSeek-R1-0528-Qwen3-8B的难度标注
提供机构:
maas
创建时间:
2025-07-11
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作