UltraInteract_sft
收藏魔搭社区2025-12-05 更新2025-05-17 收录
下载链接:
https://modelscope.cn/datasets/OpenBMB/UltraInteract_sft
下载链接
链接失效反馈官方服务:
资源简介:
## Introduction
- 📜 [Paper](https://arxiv.org/abs/2404.02078)
- 🤗 [Eurus Collection](https://huggingface.co/collections/openbmb/eurus-660bc40bec5376b3adc9d1c5)
- 🤗 UltraInteract
- [SFT](https://huggingface.co/datasets/openbmb/UltraInteract_sft)
- [Preference Learning](https://huggingface.co/datasets/openbmb/UltraInteract_pair)
- [GitHub Repo](https://github.com/OpenBMB/Eurus)
UltraInteract is a large-scale, high-quality alignment dataset specifically designed for complex reasoning tasks. For each instruction, it includes a preference tree consisting of
- (1) reasoning chains with diverse planning strategies in a unified format
- (2) multi-turn interaction trajectories with the environment and the critique
- (3) pairwise data to facilitate preference learning
## Structure
UltraInteract collects a preference tree for each instruction, with the instruction being the root and each action a node. A trajectory is a root-to-leaf path consisting of a sequence of actions. In each preference tree, all nodes of correct actions and all trajectories ending with correct actions can be used for SFT. Paired correct and incorrect nodes or trajectories can be used for preference learning.
<img src="./figures/tree.png" alt="tree" style="zoom: 20%;" />
## Illustrative Example
Here is an illustrative example of an UltraInteract trajectory over two turns. In each turn, the actor model generates step-by-step reasoning chains, and the environment and the critique model provide observations and textual critique respectively.
<img src="./figures/ui_example.png" alt="ui_example" style="zoom: 25%;" />
## Stats
Below are some statistics about UltraInteract. It consists of 86k instructions, 286k correct answers, and 219k pairs.
<img src="./figures/stats.png" alt="stats" style="zoom: 40%;" />
## Dataset Format (SFT)
Coding
```JSONC
{
"id": "92a7356beaa75391539cb79624bdf85b7e5d36d5e74e46a2f1c4001f40b3e055_1_1", // unique data id, contains three parts, the unique string, the depth of the preference tree, and current turn, separated by '_'
"parent_id": "535550938ff58bd73cd7fb85be834bb90b7600daf6b6f6a16b64091e5ecbfe43", // data with the same parent_id have the same instruction but different actions.
"task": "Coding", // task type, choices = ["Coding", "Math_Cot", "Math_PoT", "Logic"]
"dataset": "codecontest", // the dataset where the instruction comes from
"instruction": "Write Python code to solve the task.\nMike has a string s consisting of only lowercase English letters. He wants to change exactly one character from the string so that the resulting one is a palindrome. \n\nA palindrome is a string that reads the same backward as forward, for example strings \"z\", \"aaa\", \"aba\", \"abccba\" are palindromes, but strings \"codeforces\", \"reality\", \"ab\" are not.\n\nInput\n\nThe first and single line contains string s (1 ≤ |s| ≤ 15).\n\nOutput\n\nPrint \"YES\" (without quotes) if Mike can change exactly one character so that the resulting string is palindrome or \"NO\" (without quotes) otherwise. \n\nExamples\n\nInput\n\nabccaa\n\n\nOutput\n\nYES\n\n\nInput\n\nabbcca\n\n\nOutput\n\nNO\n\n\nInput\n\nabcda\n\n\nOutput\n\nYES", // the task description.
"response": "
Step 1: This problem asks us to determine if it is possible to change exactly one character in the string to make it a palindrome.
Step 2: We can start by iterating through the characters in the string and counting the number of differences between corresponding characters from the beginning and end of the string.
Step 3: If the length of the string is odd and there is only one difference, or if the length is even and there is exactly one difference, then it is possible to change exactly one character to make the string a palindrome.
Step 4: We need to consider edge cases such as a single character string or an empty string.
```python
# Step 5: Take the input string
s = input()
# Step 6: Calculate the length of the string
n = len(s)
# Step 7: Initialize a counter to keep track of the number of differences
cnt = 0
# Step 8: Iterate through the first half of the string and count the number of differences between corresponding characters from the beginning and end of the string.
for i in range(n//2):
if s[i] != s[n-i-1]:
cnt += 1
# Step 9: If the length of the string is odd and there are no differences, or if the length is even and there is exactly one difference, print "YES" (without quotes)
if (n % 2 == 1 and cnt == 0) or (cnt == 1):
print('YES')
# Step 10: Otherwise, print "NO" (without quotes)
else:
print('NO')
```" // The step-by-step solution.
}
```
Math
```JSONC
{
"id": "647b2e8706cf459a8e3ec8e99727d2d51ff3414bf187db43d92cff87f8012713_1_1",
"parent_id": "499d250c5852eddcd48a28b6c173f1a2349a16555990a1cd6fb5c7954cb94587",
"task": "Math_PoT",
"dataset": "mathqa",
"instruction": "Tool available:\n[1] Python interpreter\nWhen you send a message containing Python code to python, it will be executed in a stateful Jupyter notebook environment.\nSolve the following math problem step-by-step.\nthe length of a rectangle is doubled while its width is doubled . what is the % change in area ?",
"response": "
Step 1: The area of a rectangle is given by the product of its length and width. If the length is doubled, the area changes from l*w to 2l*w. Similarly, if the width is doubled, the area changes from l*w to l*2w.
Step 2: If both the length and width are doubled, the area changes from l*w to 2l*2w. The percentage change in area is given by ((new area - old area) / old area)*100.
Step 3: We can calculate the percentage change in area using the formula above.
Step 4: By creating a tool to calculate the percentage change in area, we can easily solve the question.
```python
# Step 5: Define a tool to calculate the percentage change in area
def calculate_percentage_change(length, width):
# Step 6: Calculate the old and new areas
old_area = length * width
new_area = 2 * length * 2 * width
# Step 7: Calculate the percentage change
percentage_change = ((new_area - old_area) / old_area) * 100
return percentage_change
# Step 8: Call the function to solve the question
percentage_change = calculate_percentage_change(2, 2)
print(percentage_change)
```"
}
```
Logic
```JSONC
{
"id": "92a19848da4fb0fc0b414f061f7500a81ea4cbd8f922ca1e32a6a15ef6f7e1a0_1_1",
"parent_id": "107cc9147d8c09e62e2ba9e74cb41d9dbafd9a3adf62b67ba3da2b31c81b3300",
"task": "Logic",
"dataset": "hotpotqa",
"instruction": "
Tool available:\n[1] wikipedia_search(query: str) -> str\nThe Wikipedia Search tool provides access to a vast collection of articles covering a wide range of topics.\nCan query specific keywords or topics to retrieve accurate and comprehensive information.
Solve the following problem step-by-step:\nWhich musician or group is from the US, 3OH!3 or Regina Spektor?
Step 1: To determine which musician or group is from the US, we can search for information about each of them.
```python
# Step 2: Searching for the information about 3OH!3.
print(wikipedia_search('3OH!3'))
```
Page: 3OH!3
Summary: 3OH!3 (pronounced \"three oh three\") is an American electronic music duo from Boulder, Colorado, made up of Sean Foreman and Nathaniel Motte. They are best known for their single \"DONTTRUSTME\" from their album Want, which reached number seven on the Billboard Hot 100. Their second single, a remix of \"Starstrukk\" featuring Katy Perry from Want, was a top ten hit in the United Kingdom, Ireland, Finland, Poland, and Australia. They gained further recognition by featuring Kesha on the song \"My First Kiss\", which was made the lead single from their album Streets of Gold. The album later peaked at number seven on the Billboard 200.\n\n\n\nPage: 3OH!3 (album)\nSummary: 3OH!3 is the debut studio album by American electronic music band 3OH!3. The tracks \"Holler Til You Pass Out\" and \"Chokechain\" were later remixed for the band's major label debut. \"Holler Til You Pass Out\", \"Chokechain\", and \"Dance with Me\" are excluded from the iTunes release of the album.",
"response": "
Step 1: The provided information confirms that 3OH!3 is an American electronic music duo from Boulder, Colorado.
Step 2: We no longer need to perform a search for Regina Spektor as we were only tasked to determine which of the two options is from the US and we have already confirmed 3OH!3's origin.
Step 3: We can confidently conclude that 3OH!3 is the musician/group from the US.
Answer:3OH!3"
}
```
## Citation
```bib
@misc{yuan2024advancing,
title={Advancing LLM Reasoning Generalists with Preference Trees},
author={Lifan Yuan and Ganqu Cui and Hanbin Wang and Ning Ding and Xingyao Wang and Jia Deng and Boji Shan and Huimin Chen and Ruobing Xie and Yankai Lin and Zhenghao Liu and Bowen Zhou and Hao Peng and Zhiyuan Liu and Maosong Sun},
year={2024},
eprint={2404.02078},
archivePrefix={arXiv},
primaryClass={cs.AI}
}
```
## 简介
- 📜 [论文](https://arxiv.org/abs/2404.02078)
- 🤗 [Eurus 合集](https://huggingface.co/collections/openbmb/eurus-660bc40bec5376b3adc9d1c5)
- 🤗 UltraInteract
- [监督微调(Supervised Fine-Tuning,SFT)数据集](https://huggingface.co/datasets/openbmb/UltraInteract_sft)
- [偏好学习数据集](https://huggingface.co/datasets/openbmb/UltraInteract_pair)
- [GitHub 仓库](https://github.com/OpenBMB/Eurus)
UltraInteract是专为复杂推理任务打造的大规模高质量对齐数据集。针对每条指令,该数据集包含一棵偏好树,涵盖以下三类内容:
1. 采用统一格式、搭载多样规划策略的推理链
2. 与环境交互并接收评论的多轮交互轨迹
3. 用于辅助偏好学习的成对样本
## 数据结构
UltraInteract为每条指令构建一棵偏好树,以指令作为根节点,每个动作为一个节点。轨迹是由一系列动作组成的从根节点到叶节点的路径。在每棵偏好树中,所有正确动作的节点以及以正确动作结尾的轨迹均可用于监督微调(SFT);成对的正确与错误节点或轨迹可用于偏好学习。
<img src="./figures/tree.png" alt="tree" style="zoom: 20%;" />
## 示例说明
以下为UltraInteract两交互轨迹的示例。在每一轮交互中,行动者模型生成逐步推理链,环境与评论模型分别提供观测结果与文本评论。
<img src="./figures/ui_example.png" alt="ui_example" style="zoom: 25%;" />
## 统计信息
以下为UltraInteract的部分统计数据:该数据集共包含8.6万条指令、28.6万条正确答案以及21.9万对样本。
<img src="./figures/stats.png" alt="stats" style="zoom: 40%;" />
## 数据集格式(监督微调)
### 编码任务
JSONC
{
"id": "92a7356beaa75391539cb79624bdf85b7e5d36d5e74e46a2f1c4001f40b3e055_1_1",
"parent_id": "535550938ff58bd73cd7fb85be834bb90b7600daf6b6f6a16b64091e5ecbfe43",
"task": "Coding",
"dataset": "codecontest",
"instruction": "编写Python代码解决该任务。
Mike有一个仅由小写英文字母组成的字符串s。他希望恰好修改字符串中的一个字符,使得修改后的字符串成为回文串。
回文串是指正读和反读完全相同的字符串,例如“z”、“aaa”、“aba”、“abccba”均为回文串,而“codeforces”、“reality”、“ab”则不属于回文串。
输入
输入仅一行,为字符串s(1 ≤ |s| ≤ 15)。
输出
若可通过恰好修改一个字符使字符串成为回文串,则输出“YES”(不包含引号),否则输出“NO”(不包含引号)。
示例
输入
abccaa
输出
YES
输入
abbcca
输出
NO
输入
abcda
输出
YES",
"response": "
步骤1:本题要求判断是否可以通过恰好修改字符串中的一个字符,使其成为回文串。
步骤2:我们可以遍历字符串,统计字符串首尾对应位置字符的差异数量。
步骤3:若字符串长度为奇数且仅存在一处差异,或字符串长度为偶数且恰好存在一处差异,则可通过修改一个字符使字符串成为回文串。
步骤4:我们需要考虑单字符字符串、空字符串等边界情况。
python
# 步骤5:读取输入字符串
s = input()
# 步骤6:计算字符串长度
n = len(s)
# 步骤7:初始化计数器,统计差异字符数
cnt = 0
# 步骤8:遍历字符串前半部分,统计首尾对应位置的字符差异数
for i in range(n//2):
if s[i] != s[n-i-1]:
cnt += 1
# 步骤9:若字符串长度为奇数且差异数为0,或差异数恰好为1,则输出“YES”(不包含引号)
if (n % 2 == 1 and cnt == 0) or (cnt == 1):
print('YES')
# 步骤10:否则输出“NO”(不包含引号)
else:
print('NO')
"
}
### 数学任务(PoT)
JSONC
{
"id": "647b2e8706cf459a8e3ec8e99727d2d51ff3414bf187db43d92cff87f8012713_1_1",
"parent_id": "499d250c5852eddcd48a28b6c173f1a2349a16555990a1cd6fb5c7954cb94587",
"task": "Math_PoT",
"dataset": "mathqa",
"instruction": "可用工具:
[1] Python 解释器
当你向python发送包含Python代码的消息时,代码将在有状态的Jupyter笔记本环境中执行。
逐步解决以下数学问题:将矩形的长度加倍,同时将其宽度加倍,面积的变化百分比是多少?",
"response": "
步骤1:矩形的面积等于长与宽的乘积。若仅将长度加倍,面积将从l*w变为2l*w;同理,若仅将宽度加倍,面积将变为l*2w。
步骤2:若同时将长度和宽度加倍,面积将从l*w变为2l*2w。面积的百分比变化可通过公式((新面积 - 旧面积) / 旧面积)*100计算。
步骤3:我们可通过上述公式计算面积的百分比变化。
步骤4:通过编写工具函数计算面积的百分比变化,即可轻松解决该问题。
python
# 步骤5:定义计算面积百分比变化的工具函数
def calculate_percentage_change(length, width):
# 步骤6:计算旧面积与新面积
old_area = length * width
new_area = 2 * length * 2 * width
# 步骤7:计算百分比变化
percentage_change = ((new_area - old_area) / old_area) * 100
return percentage_change
# 步骤8:调用函数解决该问题
percentage_change = calculate_percentage_change(2, 2)
print(percentage_change)
"
}
### 逻辑推理任务
JSONC
{
"id": "92a19848da4fb0fc0b414f061f7500a81ea4cbd8f922ca1e32a6a15ef6f7e1a0_1_1",
"parent_id": "107cc9147d8c09e62e2ba9e74cb41d9dbafd9a3adf62b67ba3da2b31c81b3300",
"task": "Logic",
"dataset": "hotpotqa",
"instruction": "
可用工具:
[1] wikipedia_search(query: str) -> str
维基百科搜索工具可访问涵盖广泛主题的海量文章库,支持通过特定关键词或主题检索准确且全面的信息。
逐步解决以下问题:3OH!3和Regina Spektor中哪位音乐家/组合来自美国?
步骤1:要确定哪位音乐家/组合来自美国,我们可以分别搜索二者的相关信息。
python
# 步骤2:搜索3OH!3的相关信息。
print(wikipedia_search('3OH!3'))
页面:3OH!3
摘要:3OH!3(发音为“three oh three”)是一支来自美国科罗拉多州博尔德的电子音乐双人组,成员为Sean Foreman与Nathaniel Motte。他们因专辑《Want》中的单曲《DONTTRUSTME》而广为人知,该单曲曾登顶Billboard Hot 100榜单第7位。他们的第二支单曲《Starstrukk》(与Katy Perry合作的混音版)出自专辑《Want》,曾在英国、爱尔兰、芬兰、波兰和澳大利亚进入榜单前十。他们还因与Kesha合作的歌曲《My First Kiss》获得更多关注,该歌曲作为专辑《Streets of Gold》的主打单曲发行,该专辑曾登顶Billboard 200榜单第7位。
页面:3OH!3(专辑)
摘要:3OH!3是美国电子音乐组合3OH!3的首张录音室专辑,曲目《Holler Til You Pass Out》与《Chokechain》之后被重新混音,收录于该组合的厂牌出道专辑。《Holler Til You Pass Out》、《Chokechain》与《Dance with Me》未收录于iTunes版本的专辑中。",
"response": "
步骤1:提供的信息已确认3OH!3是来自美国科罗拉多州博尔德的电子音乐双人组。
步骤2:我们无需再搜索Regina Spektor的信息,因为本题仅需判断二者中哪位来自美国,且我们已确认3OH!3的来源地。
步骤3:我们可以确信3OH!3是来自美国的音乐家/组合。
答案:3OH!3"
}
## 引用
bib
@misc{yuan2024advancing,
title={基于偏好树推进大语言模型(Large Language Model,LLM)推理通用能力},
author={Lifan Yuan and Ganqu Cui and Hanbin Wang and Ning Ding and Xingyao Wang and Jia Deng and Boji Shan and Huimin Chen and Ruobing Xie and Yankai Lin and Zhenghao Liu and Bowen Zhou and Hao Peng and Zhiyuan Liu and Maosong Sun},
year={2024},
eprint={2404.02078},
archivePrefix={arXiv},
primaryClass={cs.AI}
}
提供机构:
maas
创建时间:
2025-05-15



