s1-m_beta

Name: s1-m_beta
Creator: maas
Published: 2026-01-09 00:15:59
License: 暂无描述

魔搭社区2026-01-09 更新2025-03-15 收录

下载链接：

https://modelscope.cn/datasets/PKU-Alignment/s1-m_beta

下载链接

链接失效反馈

官方服务：

资源简介：

# S1-M Dataset (Beta) [🏠 Homepage](https://github.com/PKU-Alignment/s1-m) | [👍 Our Official Code Repo](https://github.com/PKU-Alignment/s1-m) | [🤗 S1-M-7B Model (Beta)](https://huggingface.co/PKU-Alignment/s1-m_7b_beta) S1-M Dataset (Beta) is an open-source TI2T reasoning dataset used to train the S1-M Model (Beta), giving it a "think first, then response" paradigm. The prompts and images in the S1-M Dataset (Beta) come from two open-source datasets: [align-anything](https://huggingface.co/datasets/PKU-Alignment/align-anything) and [multimodal-open-r1-8k-verified](https://huggingface.co/datasets/lmms-lab/multimodal-open-r1-8k-verified), accounting for 49.62% and 50.38% respectively, aiming to balance the model's general capabilities with mathematical abilities. Data annotation uses **Claude 3.7 Sonnet 20250219** as the annotation model, which is guided to think first and then provide answers through a system prompt as shown below. ``` You are a reasoning model with advanced analytical capabilities. I will provide an image and ask a question about it. Your task is to analyze the image thoroughly and answer my question accurately. Response format: <think> [step-by-step reasoning process] </think> [final answer] Guidelines: 1. Place your reasoning process between <think> and </think> tags first, and the private answer after that. 2. The reasoning process can include expressions like "let me think," "oh, I see,", "maybe I should think about it from a different angle," or other natural language thought expressions. 3. For multiple-choice questions, end with "Answer: [LETTER]" where LETTER corresponds to your selected option. Remember to be thorough in your analysis but concise in your final answer. ``` The system prompt requires Claude 3.7 to first place its thinking process between the thinking markers `<think>` and `</think>`, and then provide the final answer based on this thinking, forming a "think first, then response" paradigm. Through this thinking process, the annotated responses have a longer token distribution. The length distribution of thinking content + answer content in the S1-M Dataset (Beta) is shown in the figure below. ![Token Length Distribution](token_length_distribution.png) **Note: The S1-M Dataset (Beta) is still under development and the final version has not yet been released.**

# S1-M 数据集（测试版） [🏠 项目主页](https://github.com/PKU-Alignment/s1-m) | [👍 官方代码仓库](https://github.com/PKU-Alignment/s1-m) | [🤗 S1-M-7B 模型（测试版）](https://huggingface.co/PKU-Alignment/s1-m_7b_beta) S1-M 数据集（测试版）是一款开源的图像到文本（TI2T）推理数据集，用于训练S1-M模型（测试版），使其具备“先思考、后回复”的范式。该数据集内的提示词与图像源自两个开源数据集：[align-anything](https://huggingface.co/datasets/PKU-Alignment/align-anything) 与 [multimodal-open-r1-8k-verified](https://huggingface.co/datasets/lmms-lab/multimodal-open-r1-8k-verified)，二者占比分别为49.62%与50.38%，旨在平衡模型的通用能力与数学推理能力。数据标注采用**Claude 3.7 Sonnet 20250219**作为标注模型，并通过如下系统提示词引导其先思考再给出答案：你是一款具备高级分析能力的推理模型。我将提供一张图像并询问与之相关的问题，你的任务是对该图像进行全面分析，并准确回答我的问题。回复格式： <think> [逐步推理过程] </think> [最终答案] 指导原则： 1. 请先将推理过程置于<think>与</think>标签之间，随后给出最终答案。 2. 推理过程可包含“让我思考一下”“哦，我明白了”“或许我应该换个角度思考”等自然语言思考类表达。 3. 若为选择题，请以“Answer: [字母]”结尾，其中[字母]对应你选择的选项。请务必做到分析全面、最终答案简洁。该系统提示词要求Claude 3.7先将推理过程置于<think>与</think>标记之间，再基于此推理过程给出最终答案，由此形成“先思考、后回复”的范式。通过该推理过程，标注得到的回复具备更长的Token分布。S1-M数据集（测试版）中推理内容与回复内容的总长度分布如下图所示。 ![Token长度分布](token_length_distribution.png) **注意：S1-M数据集（测试版）仍处于开发阶段，最终版本尚未发布。**

提供机构：

maas

创建时间：

2025-03-14

5,000+

优质数据集

54 个

任务类型

进入经典数据集