baidu/TARA
收藏数据集卡片 TARA
数据集概述
TARA 是一个新颖的工具增强型奖励建模数据集,包含人类偏好和详细工具调用过程的综合比较数据。该数据集在这篇论文中被介绍,并用于训练Themis-7b。
支持的工具
TARA 支持多种工具,包括 Calculator, Code, Translator, Google Search, Calendar, Weather, WikiSearch 和 Multi-tools。
数据集结构
- calculator: 涉及
Calculator工具的偏好数据,基于 GSM-8K 构建。 - code: 涉及
Code工具的偏好数据,包含问题、正向代码答案和测试用例。 - calendar: 涉及
Calendar工具的偏好数据,主要功能包括确定给定日期的星期几、计算两个日期之间的差异以及找到在另一个日期之后 n 天的日期。 - weather: 涉及
Weather工具的偏好数据,接收城市和日期作为输入,输出指定城市在给定日期的天气信息。 - translator: 涉及
Translator工具的偏好数据,包含7种不同语言的QA实例。 - webgpt: 涉及
Google Search工具的偏好数据,基于奖励数据集 webgpt_comparisons 构建。 - wikisearch: 涉及
WikiSearch工具的偏好数据,包含向Google搜索引擎提出的真实匿名聚合查询,并标注了Wikipedia页面。 - multi_tools: 涉及
calendar和weather工具的偏好数据。 - mix: 上述数据的聚合。
- hh_rlhf: 从 Anthropic/hh-rlhf 中随机选择的150个实例。
- mix_hh: hh_rlhf 和 mix 的聚合。
- truthful_qa: truthful_qa 的偏好格式。
- retarded_bar: retarded_bar 的英语和偏好版本。
数据实例
TARA 的一个示例如下: text { "id": "calendar_train_135", "question": "Please let me know the day of the week that 2004-05-03 refers to.", "pos_answer": { "answer": "2004-05-03 is Monday", "score": 1.0, "actions": "Thought: I need to search what day of the week is 2004-05-03. Action: Calendar.week_day Action Input: 2004-05-03 Observation: Monday", "score_agent": { "explanation": "The answer given, which states that the date 2004-05-03 is a Monday, aligns perfectly with the tools execution result that also shows that the day was a Monday. This means that the provided answer is accurate and well-supported by the tools findings.", "score": "1.0" } }, "neg_answer": { "answer": "2004-05-03 is Friday", "score": -1.0, "actions": "Thought: I need to search what day of the week is 2004-05-03. Action: Calendar.week_day Action Input: 2004-05-03 Observation: Monday", "score_agent": { "explanation": "The provided answer indicated that May 3, 2004, was a Friday, which contradicts the result of the Calendar tool that clearly specifies that the date in question was a Monday. The glaring difference between the two responses points to the fact that the given answer is wrong.", "score": "-1.0" } } }




