INTELLECT-2-RL-Dataset

Name: INTELLECT-2-RL-Dataset
Creator: maas
Published: 2025-12-05 16:34:15
License: 暂无描述

魔搭社区2025-12-05 更新2025-05-17 收录

下载链接：

https://modelscope.cn/datasets/PrimeIntellect/INTELLECT-2-RL-Dataset

下载链接

链接失效反馈

官方服务：

资源简介：

# INTELLECT-2 INTELLECT-2 is a 32 billion parameter language model trained through a reinforcement learning run leveraging globally distributed, permissionless GPU resources contributed by the community. The model was trained using [prime-rl](https://github.com/PrimeIntellect-ai/prime-rl), a framework designed for distributed asynchronous RL, using GRPO over verifiable rewards along with modifications for improved training stability. For detailed information on our infrastructure and training recipe, see our [technical report](https://storage.googleapis.com/public-technical-paper/INTELLECT_2_Technical_Report.pdf). ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64a32edf17b9f57eaec2ea65/aHUkuQFdAL7_ruLpTmmG8.png) ## Model Information - Training Dataset (verifiable math & coding tasks): [PrimeIntellect/Intellect-2-RL-Dataset](https://huggingface.co/datasets/PrimeIntellect/INTELLECT-2-RL-Dataset) - Base Model: [QwQ-32B](https://huggingface.co/Qwen/QwQ-32B) - Training Code: [prime-rl](https://github.com/PrimeIntellect-ai/prime-rl) ## Usage INTELLECT-2 is based on the `qwen2` architecture, making it compatible with popular libraries and inference engines such as [vllm](https://github.com/vllm-project/vllm) or [sglang](https://github.com/sgl-project/sglang). Given that INTELLECT-2 was trained with a length control budget, you will achieve the best results by appending the prompt `"Think for 10000 tokens before giving a response."` to your instruction. As reported in our technical report, the model did not train for long enough to fully learn the length control objective, which is why results won't differ strongly if you specify lengths other than 10,000. If you wish to do so, you can expect the best results with 2000, 4000, 6000 and 8000, as these were the other target lengths present during training. ## Performance During training, INTELLECT-2 improved upon QwQ in its mathematical and coding abilities. Performance on IFEval slightly decreased, which can likely be attributed to the lack of diverse training data and pure focus on mathematics and coding. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64a32edf17b9f57eaec2ea65/4k_Nmj2g8MqC7I6ORIkMH.png) | **Model** | **AIME24** | **AIME25** | **LiveCodeBench (v5)** | **GPQA-Diamond** | **IFEval** | | ------------------- | ---------- | ---------- | ---------------------- | ---------------- | ---------- | | INTELLECT-2 | **78.8** | 64.9 | **67.8** | 66.8 | 81.5 | | QwQ-32B | 76.6 | 64.8 | 66.1 | 66.3 | 83.4 | | Qwen-R1-Distill-32B | 69.9 | 58.4 | 55.1 | 65.2 | 72.0 | | Deepseek-R1 | 78.6 | 65.1 | 64.1 | 71.6 | 82.7 | ## Citation ``` @misc{primeintellectteam2025intellect2reasoningmodeltrained, title={INTELLECT-2: A Reasoning Model Trained Through Globally Decentralized Reinforcement Learning}, author={Prime Intellect Team and Sami Jaghouar and Justus Mattern and Jack Min Ong and Jannik Straube and Manveer Basra and Aaron Pazdera and Kushal Thaman and Matthew Di Ferrante and Felix Gabriel and Fares Obeid and Kemal Erdem and Michael Keiblinger and Johannes Hagemann}, year={2025}, eprint={2505.07291}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2505.07291}, } ```

# INTELLECT-2 INTELLECT-2 是一款拥有320亿参数的语言模型，通过强化学习训练，依托社区贡献的全球分布式、无需许可的GPU资源完成训练。该模型使用[prime-rl](https://github.com/PrimeIntellect-ai/prime-rl)框架进行训练，该框架专为分布式异步强化学习（RL）设计，采用基于可验证奖励的GRPO算法，并针对提升训练稳定性进行了优化。如需了解我们的基础设施与训练流程的详细信息，请参阅我们的[技术报告](https://storage.googleapis.com/public-technical-paper/INTELLECT_2_Technical_Report.pdf)。 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64a32edf17b9f57eaec2ea65/aHUkuQFdAL7_ruLpTmmG8.png) ## 模型信息 - 训练数据集（可验证数学与编码任务）：[PrimeIntellect/INTELLECT-2-RL-Dataset](https://huggingface.co/datasets/PrimeIntellect/INTELLECT-2-RL-Dataset) - 基础模型：[QwQ-32B](https://huggingface.co/Qwen/QwQ-32B) - 训练代码：[prime-rl](https://github.com/PrimeIntellect-ai/prime-rl) ## 使用说明 INTELLECT-2 基于`qwen2`架构，可与主流库及推理引擎兼容，例如[vllm](https://github.com/vllm-project/vllm)或[sglang](https://github.com/sgl-project/sglang)。由于INTELLECT-2在训练时采用了长度控制预算，建议在指令后追加提示词**“先思考10000个Token再给出回复”**以获得最佳效果。如我们在技术报告中所述，该模型的训练时长不足以完全掌握长度控制目标，因此若指定其他长度，最终结果不会出现显著差异。若您希望尝试其他长度，2000、4000、6000及8000 Token将带来最佳表现，因为这些是训练期间使用的其他目标长度。 ## 性能表现训练期间，INTELLECT-2在数学与编码能力上相较于QwQ有所提升，但在IFEval任务上的性能略有下降，这大概率可归因于训练数据缺乏多样性且仅聚焦于数学与编码任务。 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64a32edf17b9f57eaec2ea65/4k_Nmj2g8MqC7I6ORIkMH.png) | **模型** | **AIME24** | **AIME25** | **LiveCodeBench (v5)** | **GPQA-Diamond** | **IFEval** | | ------------------- | ---------- | ---------- | ---------------------- | ---------------- | ---------- | | INTELLECT-2 | **78.8** | 64.9 | **67.8** | 66.8 | 81.5 | | QwQ-32B | 76.6 | 64.8 | 66.1 | 66.3 | 83.4 | | Qwen-R1-Distill-32B | 69.9 | 58.4 | 55.1 | 65.2 | 72.0 | | Deepseek-R1 | 78.6 | 65.1 | 64.1 | 71.6 | 82.7 | ## 引用格式 @misc{primeintellectteam2025intellect2reasoningmodeltrained, title={INTELLECT-2: A Reasoning Model Trained Through Globally Decentralized Reinforcement Learning}, author={Prime Intellect Team and Sami Jaghouar and Justus Mattern and Jack Min Ong and Jannik Straube and Manveer Basra and Aaron Pazdera and Kushal Thaman and Matthew Di Ferrante and Felix Gabriel and Fares Obeid and Kemal Erdem and Michael Keiblinger and Johannes Hagemann}, year={2025}, eprint={2505.07291}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2505.07291}, }

提供机构：

maas

创建时间：

2025-05-13

5,000+

优质数据集

54 个

任务类型

进入经典数据集