five

baidu/TARA

收藏
Hugging Face2024-02-20 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/baidu/TARA
下载链接
链接失效反馈
官方服务:
资源简介:
TARA是一个新颖的工具增强奖励建模数据集,包含了人类偏好的全面比较数据和详细的工具调用过程。该数据集支持多种工具,如计算器、代码、翻译器、谷歌搜索、日历、天气、维基搜索和多工具等。数据集结构包括多个子集,每个子集涉及不同工具的偏好数据。例如,`calculator`子集涉及计算器工具的偏好数据,`code`子集涉及代码工具的偏好数据,`calendar`子集涉及日历工具的偏好数据,`weather`子集涉及天气工具的偏好数据,`translator`子集涉及翻译器工具的偏好数据,`webgpt`子集涉及谷歌搜索工具的偏好数据,`wikisearch`子集涉及维基搜索工具的偏好数据,`multi_tools`子集涉及日历和天气工具的偏好数据,`mix`子集是上述数据的聚合,`hh_rlhf`子集是从Anthropic/hh-rlhf数据集中随机选取的150个实例,`mix_hh`子集是`hh_rlhf`和`mix`的聚合,`truthful_qa`子集是truthful_qa数据集的偏好格式,`retarded_bar`子集是retarded_bar数据集的英文和偏好版本。

TARA is a novel tool-augmented reward modeling dataset that contains comprehensive comparative human preference data and detailed tool invocation processes. The dataset supports a variety of tools, including calculators, code tools, translators, Google Search, calendars, weather query tools, Wikipedia search tools, and multi-tool scenarios. The dataset is structured into multiple subsets, each corresponding to preference data for different tools. For example, the `calculator` subset contains preference data for the calculator tool; the `code` subset contains preference data for the code tool; the `calendar` subset contains preference data for the calendar tool; the `weather` subset contains preference data for the weather tool; the `translator` subset contains preference data for the translator tool; the `webgpt` subset contains preference data for the Google Search tool; the `wikisearch` subset contains preference data for the Wikipedia search tool; the `multi_tools` subset contains preference data for calendar and weather tools; the `mix` subset is an aggregation of the aforementioned datasets; the `hh_rlhf` subset randomly selects 150 instances from the Anthropic/hh-rlhf dataset; the `mix_hh` subset is an aggregation of the `hh_rlhf` and `mix` subsets; the `truthful_qa` subset follows the preference format of the truthful_qa dataset; the `retarded_bar` subset provides the English and preference versions of the retarded_bar dataset.
提供机构:
baidu
原始信息汇总

数据集卡片 TARA

数据集概述

TARA 是一个新颖的工具增强型奖励建模数据集,包含人类偏好和详细工具调用过程的综合比较数据。该数据集在这篇论文中被介绍,并用于训练Themis-7b

支持的工具

TARA 支持多种工具,包括 Calculator, Code, Translator, Google Search, Calendar, Weather, WikiSearchMulti-tools

数据集结构

  • calculator: 涉及 Calculator 工具的偏好数据,基于 GSM-8K 构建。
  • code: 涉及 Code 工具的偏好数据,包含问题、正向代码答案和测试用例。
  • calendar: 涉及 Calendar 工具的偏好数据,主要功能包括确定给定日期的星期几、计算两个日期之间的差异以及找到在另一个日期之后 n 天的日期。
  • weather: 涉及 Weather 工具的偏好数据,接收城市和日期作为输入,输出指定城市在给定日期的天气信息。
  • translator: 涉及 Translator 工具的偏好数据,包含7种不同语言的QA实例。
  • webgpt: 涉及 Google Search 工具的偏好数据,基于奖励数据集 webgpt_comparisons 构建。
  • wikisearch: 涉及 WikiSearch 工具的偏好数据,包含向Google搜索引擎提出的真实匿名聚合查询,并标注了Wikipedia页面。
  • multi_tools: 涉及 calendarweather 工具的偏好数据。
  • mix: 上述数据的聚合。
  • hh_rlhf: 从 Anthropic/hh-rlhf 中随机选择的150个实例。
  • mix_hh: hh_rlhfmix 的聚合。
  • truthful_qa: truthful_qa 的偏好格式。
  • retarded_bar: retarded_bar 的英语和偏好版本。

数据实例

TARA 的一个示例如下: text { "id": "calendar_train_135", "question": "Please let me know the day of the week that 2004-05-03 refers to.", "pos_answer": { "answer": "2004-05-03 is Monday", "score": 1.0, "actions": "Thought: I need to search what day of the week is 2004-05-03. Action: Calendar.week_day Action Input: 2004-05-03 Observation: Monday", "score_agent": { "explanation": "The answer given, which states that the date 2004-05-03 is a Monday, aligns perfectly with the tools execution result that also shows that the day was a Monday. This means that the provided answer is accurate and well-supported by the tools findings.", "score": "1.0" } }, "neg_answer": { "answer": "2004-05-03 is Friday", "score": -1.0, "actions": "Thought: I need to search what day of the week is 2004-05-03. Action: Calendar.week_day Action Input: 2004-05-03 Observation: Monday", "score_agent": { "explanation": "The provided answer indicated that May 3, 2004, was a Friday, which contradicts the result of the Calendar tool that clearly specifies that the date in question was a Monday. The glaring difference between the two responses points to the fact that the given answer is wrong.", "score": "-1.0" } } }

搜集汇总
数据集介绍
main_image_url
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作