UltraInteract_pair

Name: UltraInteract_pair
Creator: maas
Published: 2025-12-05 16:34:37
License: 暂无描述

魔搭社区2025-12-05 更新2025-05-17 收录

下载链接：

https://modelscope.cn/datasets/OpenBMB/UltraInteract_pair

下载链接

链接失效反馈

官方服务：

资源简介：

## Introduction - 📜 [Paper](https://arxiv.org/abs/2404.02078) - 🤗 [Eurus Collection](https://huggingface.co/collections/openbmb/eurus-660bc40bec5376b3adc9d1c5) - 🤗 UltraInteract - [SFT](https://huggingface.co/datasets/openbmb/UltraInteract_sft) - [Preference Learning](https://huggingface.co/datasets/openbmb/UltraInteract_pair) - [GitHub Repo](https://github.com/OpenBMB/Eurus) UltraInteract is a large-scale, high-quality alignment dataset specifically designed for complex reasoning tasks. For each instruction, it includes a preference tree consisting of - (1) reasoning chains with diverse planning strategies in a unified format - (2) multi-turn interaction trajectories with the environment and the critique - (3) pairwise data to facilitate preference learning ## Structure UltraInteract collects a preference tree for each instruction, with the instruction being the root and each action a node. A trajectory is a root-to-leaf path consisting of a sequence of actions. In each preference tree, all nodes of correct actions and all trajectories ending with correct actions can be used for SFT. Paired correct and incorrect nodes or trajectories can be used for preference learning. <img src="./figures/tree.png" alt="tree" style="zoom: 20%;" /> ## Illustrative Example Here is an illustrative example of an UltraInteract trajectory over two turns. In each turn, the actor model generates step-by-step reasoning chains, and the environment and the critique model provide observations and textual critique respectively. <img src="./figures/ui_example.png" alt="ui_example" style="zoom: 25%;" /> ## Stats Below are some statistics about UltraInteract. It consists of 86k instructions, 286k correct answers, and 219k pairs. <img src="./figures/stats.png" alt="stats" style="zoom: 40%;" /> ## Dataset Format (Preference Learning) ```JSONC { "id": "d1f094e0e13096bc1bb6260e1e0d807e2c4ebfbbf4a4373f04cbed001c385c3c_5_3", // unique data id, contains three parts, the unique string, the depth of the preference tree, and current turn, separated by '_' "parent_id": "d1f094e0e13096bc1bb6260e1e0d807e2c4ebfbbf4a4373f04cbed001c385c3c", // data with the same parent_id have the same instruction but different actions. "task": "Math_PoT", // task type, choices = ["Coding", "Math_Cot", "Math_PoT", "Logic"] "dataset": "MATH", // the dataset where the instruction comes from "trajectory": [ { "from": "user", "value": "Tool available:\n[1] Python interpreter\nWhen you send a message containing Python code to python, it will be executed in a stateful Jupyter notebook environment.\nSolve the following math problem step-by-step.\nSimplify your answer as much as possible.\nSimplify $(-3-2i)-(1+4i)$." // the task instruction }, { "from": "assistant", "value": "Step 1: To simplify the expression $(-3-2i)-(1+4i)$, we need to distribute the negative sign to both terms inside the parentheses.\nStep 2: We can then combine like terms by adding the real parts and the imaginary parts separately.\n```python\n# Step 3: Distribute the negative sign to both terms inside the parentheses.\n(-3) - (-1) + (-2i) - (-4i)\n# Step 4: Combine like terms by adding the real parts and the imaginary parts separately.\n-3 + 1 + (-2i + 4i)\n```" // the assistant's action (turn 1) }, { "from": "user", "value": "Out: Cell In[1], line 2\n (-3) - (-1) + (-2i) - (-4i)\n ^\nSyntaxError: invalid syntax\nYour answer is wrong.\nYour thought process for simplifying the given complex numbers is correct. However, you made a syntax error in the code. In Python, the multiplication operation must be explicitly written out with an asterisk (*), even for complex numbers. Remember, the imaginary unit in Python is denoted as j and not i. Try correcting the syntax and calculate again." // the observation from python interpreter and feedback (turn 1) }, { "from": "assistant", "value": "Step 1: To simplify the expression $(-3-2i)-(1+4i)$, we need to distribute the negative sign to both terms inside the parentheses.\nStep 2: We can then combine like terms by adding the real parts and the imaginary parts separately.\n```python\n# Step 3: Distribute the negative sign to both terms inside the parentheses.\n(-3) - (-1) + (-2j) - (-4j)\n# Step 4: Combine like terms by adding the real parts and the imaginary parts separately.\n-3 + 1 + (-2j + 4j)\n```" // the assistant's action acording to the observation and feedback.(turn 2) }, { "from": "user", "value": "Out: (-2+2j)\nYour answer is wrong.\nYour thought process for simplifying the given complex numbers is on the right track. However, you made an arithmetic error when adding the real parts (-3 + 1 ≠ -2) and the imaginary parts (-2j + 4j ≠ 2j). You should have combined the like terms correctly to get the right answer -4 and -6j respectively. Please try again with the correct arithmetic calculations." // the observation from python interpreter and feedback. (turn 2) } ] // trajectory is the interaction between user and assistant, and the first element is the task instruction. "chosen": "Step 1: Split the expression into real and imaginary terms.\nStep 2: Subtract the real terms from each other.\nStep 3: Subtract the imaginary terms from each other.\nStep 4: Combine the results of the subtraction to get the simplified form of the expression.\n\n\n\n```python\n# Step 5: Split the expression into real and imaginary parts and then subtract them.\nreal_part = -3 - 1 # Subtract the real parts\nimaginary_part = -2 - 4 # Subtract the imaginary parts\n# Step 6: Combine the real and imaginary parts to get the result.\nresult = f\"{real_part}{imaginary_part:+d}i\"\nprint(result)\n```", // the chosen answer based on the trajectory. "rejected": "Step 1: To simplify the expression $(-3-2i)-(1+4i)$, we need to distribute the negative sign to both terms inside the parentheses.\nStep 2: We can then combine like terms by adding the real parts and the imaginary parts separately.\n```python\n# Step 3: Distribute the negative sign to both terms inside the parentheses.\n(-3) - 1 - (-2j) - 4j\n# Step 4: Combine like terms by adding the real parts and the imaginary parts separately.\n-3 - 1 - 2j - 4j\n```", // the rejected answer based on the trajectory. } ``` ## Citation ```bib @misc{yuan2024advancing, title={Advancing LLM Reasoning Generalists with Preference Trees}, author={Lifan Yuan and Ganqu Cui and Hanbin Wang and Ning Ding and Xingyao Wang and Jia Deng and Boji Shan and Huimin Chen and Ruobing Xie and Yankai Lin and Zhenghao Liu and Bowen Zhou and Hao Peng and Zhiyuan Liu and Maosong Sun}, year={2024}, eprint={2404.02078}, archivePrefix={arXiv}, primaryClass={cs.AI} } ```

## 引言 - 📜 [论文](https://arxiv.org/abs/2404.02078) - 🤗 [Eurus合集](https://huggingface.co/collections/openbmb/eurus-660bc40bec5376b3adc9d1c5) - 🤗 UltraInteract - [监督微调（Supervised Fine-Tuning, SFT）数据集](https://huggingface.co/datasets/openbmb/UltraInteract_sft) - [偏好学习数据集](https://huggingface.co/datasets/openbmb/UltraInteract_pair) - [GitHub仓库](https://github.com/OpenBMB/Eurus) UltraInteract是专为复杂推理任务设计的大规模高质量对齐数据集。针对每条指令，该数据集包含一棵偏好树，涵盖三类内容： 1. 采用统一格式、具备多样化规划策略的推理链 2. 包含与环境交互以及批判反馈的多轮交互轨迹 3. 用于偏好学习的成对数据 ## 数据集结构 UltraInteract为每条指令构建一棵偏好树，以该指令作为根节点，每个动作为一个节点。轨迹即为由一系列动作构成的从根节点到叶节点的路径。在每棵偏好树中，所有正确动作的节点以及以正确动作结尾的轨迹均可用于监督微调（Supervised Fine-Tuning, SFT）；成对的正确与错误节点或轨迹则可用于偏好学习。 <img src="./figures/tree.png" alt="tree" style="zoom: 20%;" /> ## 示例说明以下为UltraInteract两轮交互轨迹的示例。在每一轮交互中，智能体模型会生成逐步推理链，环境与批判模型则分别提供观测结果与文本批判反馈。 <img src="./figures/ui_example.png" alt="ui_example" style="zoom: 25%;" /> ## 数据集统计 UltraInteract的统计信息如下：该数据集包含8.6万条指令、28.6万条正确答案以及21.9万对偏好数据。 <img src="./figures/stats.png" alt="stats" style="zoom: 40%;" /> ## 偏好学习数据集格式 JSONC { "id": "d1f094e0e13096bc1bb6260e1e0d807e2c4ebfbbf4a4373f04cbed001c385c3c_5_3", // 唯一数据ID，包含三部分：唯一字符串、偏好树深度、当前轮次，以'_'分隔 "parent_id": "d1f094e0e13096bc1bb6260e1e0d807e2c4ebfbbf4a4373f04cbed001c385c3c", // 具有相同parent_id的数据对应同一指令但不同动作 "task": "Math_PoT", // 任务类型，可选值为["Coding", "Math_Cot", "Math_PoT", "Logic"]，分别对应编程、数学思维链、数学过程式推理、逻辑推理 "dataset": "MATH", // 该指令所属的原始数据集 "trajectory": [ { "from": "user", "value": "可用工具： [1] Python解释器当您向python发送包含Python代码的消息时，代码将在有状态的Jupyter笔记本环境中执行。请逐步解决以下数学问题，并尽可能简化答案。化简表达式 $(-3-2i)-(1+4i)$。" }, { "from": "assistant", "value": "步骤1：要化简表达式 $(-3-2i)-(1+4i)$，我们需要将负号分配给括号内的所有项。步骤2：随后我们可以分别合并实部和虚部来合并同类项。 python # 步骤3：将负号分配给括号内的所有项。 (-3) - (-1) + (-2i) - (-4i) # 步骤4：分别合并实部和虚部以合并同类项。 -3 + 1 + (-2i + 4i) " }, { "from": "user", "value": "输出：Cell In[1], line 2 (-3) - (-1) + (-2i) - (-4i) ^ 语法错误：无效语法您的答案有误。您化简复数的思路是正确的，但代码中存在语法错误。在Python中，即使是复数运算，乘法也必须使用星号(*)显式写出。此外，Python中的虚数单位为j而非i。请修正语法后重新计算。" }, { "from": "assistant", "value": "步骤1：要化简表达式 $(-3-2i)-(1+4i)$，我们需要将负号分配给括号内的所有项。步骤2：随后我们可以分别合并实部和虚部来合并同类项。 python # 步骤3：将负号分配给括号内的所有项。 (-3) - (-1) + (-2j) - (-4j) # 步骤4：分别合并实部和虚部以合并同类项。 -3 + 1 + (-2j + 4j) " }, { "from": "user", "value": "输出：(-2+2j) 您的答案有误。您化简复数的思路是正确的，但在合并实部（-3 + 1 ≠ -2）和虚部（-2j + 4j ≠ 2j）时出现了算术错误。正确的计算结果应为实部-4、虚部-6j。请使用正确的算术计算重新尝试。" } ] // 轨迹为用户与助手的对话序列，首个元素为任务指令 "chosen": "步骤1：将表达式拆分为实部与虚部项。步骤2：分别对实部进行减法运算。步骤3：分别对虚部进行减法运算。步骤4：合并减法结果，得到表达式的化简形式。 python # 步骤5：将表达式拆分为实部与虚部，随后进行减法运算。 real_part = -3 - 1 # 对实部进行减法运算 imaginary_part = -2 - 4 # 对虚部进行减法运算 # 步骤6：合并实部与虚部以得到最终结果。 result = f"{real_part}{imaginary_part:+d}i" print(result) ", "rejected": "步骤1：要化简表达式 $(-3-2i)-(1+4i)$，我们需要将负号分配给括号内的所有项。步骤2：随后我们可以分别合并实部和虚部来合并同类项。 python # 步骤3：将负号分配给括号内的所有项。 (-3) - 1 - (-2j) - 4j # 步骤4：分别合并实部和虚部以合并同类项。 -3 - 1 - 2j - 4j ", } ## 引用 bib @misc{yuan2024advancing, title={基于偏好树推进大语言模型推理通用能力}, author={Lifan Yuan and Ganqu Cui and Hanbin Wang and Ning Ding and Xingyao Wang and Jia Deng and Boji Shan and Huimin Chen and Ruobing Xie and Yankai Lin and Zhenghao Liu and Bowen Zhou and Hao Peng and Zhiyuan Liu and Maosong Sun}, year={2024}, eprint={2404.02078}, archivePrefix={arXiv}, primaryClass={cs.AI} }

提供机构：

maas

创建时间：

2025-05-15

5,000+

优质数据集

54 个

任务类型

进入经典数据集