s1-m_beta
收藏魔搭社区2026-01-09 更新2025-03-15 收录
下载链接:
https://modelscope.cn/datasets/PKU-Alignment/s1-m_beta
下载链接
链接失效反馈官方服务:
资源简介:
# S1-M Dataset (Beta)
[🏠 Homepage](https://github.com/PKU-Alignment/s1-m) | [👍 Our Official Code Repo](https://github.com/PKU-Alignment/s1-m) | [🤗 S1-M-7B Model (Beta)](https://huggingface.co/PKU-Alignment/s1-m_7b_beta)
S1-M Dataset (Beta) is an open-source TI2T reasoning dataset used to train the S1-M Model (Beta), giving it a "think first, then response" paradigm. The prompts and images in the S1-M Dataset (Beta) come from two open-source datasets: [align-anything](https://huggingface.co/datasets/PKU-Alignment/align-anything) and [multimodal-open-r1-8k-verified](https://huggingface.co/datasets/lmms-lab/multimodal-open-r1-8k-verified), accounting for 49.62% and 50.38% respectively, aiming to balance the model's general capabilities with mathematical abilities. Data annotation uses **Claude 3.7 Sonnet 20250219** as the annotation model, which is guided to think first and then provide answers through a system prompt as shown below.
```
You are a reasoning model with advanced analytical capabilities. I will provide an image and ask a question about it. Your task is to analyze the image thoroughly and answer my question accurately.
Response format:
<think>
[step-by-step reasoning process]
</think>
[final answer]
Guidelines:
1. Place your reasoning process between <think> and </think> tags first, and the private answer after that.
2. The reasoning process can include expressions like "let me think," "oh, I see,", "maybe I should think about it from a different angle," or other natural language thought expressions.
3. For multiple-choice questions, end with "Answer: [LETTER]" where LETTER corresponds to your selected option.
Remember to be thorough in your analysis but concise in your final answer.
```
The system prompt requires Claude 3.7 to first place its thinking process between the thinking markers `<think>` and `</think>`, and then provide the final answer based on this thinking, forming a "think first, then response" paradigm.
Through this thinking process, the annotated responses have a longer token distribution. The length distribution of thinking content + answer content in the S1-M Dataset (Beta) is shown in the figure below.

**Note: The S1-M Dataset (Beta) is still under development and the final version has not yet been released.**
# S1-M 数据集(测试版)
[🏠 项目主页](https://github.com/PKU-Alignment/s1-m) | [👍 官方代码仓库](https://github.com/PKU-Alignment/s1-m) | [🤗 S1-M-7B 模型(测试版)](https://huggingface.co/PKU-Alignment/s1-m_7b_beta)
S1-M 数据集(测试版)是一款开源的图像到文本(TI2T)推理数据集,用于训练S1-M模型(测试版),使其具备“先思考、后回复”的范式。该数据集内的提示词与图像源自两个开源数据集:[align-anything](https://huggingface.co/datasets/PKU-Alignment/align-anything) 与 [multimodal-open-r1-8k-verified](https://huggingface.co/datasets/lmms-lab/multimodal-open-r1-8k-verified),二者占比分别为49.62%与50.38%,旨在平衡模型的通用能力与数学推理能力。数据标注采用**Claude 3.7 Sonnet 20250219**作为标注模型,并通过如下系统提示词引导其先思考再给出答案:
你是一款具备高级分析能力的推理模型。我将提供一张图像并询问与之相关的问题,你的任务是对该图像进行全面分析,并准确回答我的问题。
回复格式:
<think>
[逐步推理过程]
</think>
[最终答案]
指导原则:
1. 请先将推理过程置于<think>与</think>标签之间,随后给出最终答案。
2. 推理过程可包含“让我思考一下”“哦,我明白了”“或许我应该换个角度思考”等自然语言思考类表达。
3. 若为选择题,请以“Answer: [字母]”结尾,其中[字母]对应你选择的选项。
请务必做到分析全面、最终答案简洁。
该系统提示词要求Claude 3.7先将推理过程置于<think>与</think>标记之间,再基于此推理过程给出最终答案,由此形成“先思考、后回复”的范式。
通过该推理过程,标注得到的回复具备更长的Token分布。S1-M数据集(测试版)中推理内容与回复内容的总长度分布如下图所示。

**注意:S1-M数据集(测试版)仍处于开发阶段,最终版本尚未发布。**
提供机构:
maas
创建时间:
2025-03-14



