vincentmin/eli5_rlhf

Name: vincentmin/eli5_rlhf
Creator: vincentmin
Published: 2023-04-10 07:58:18
License: 暂无描述

Hugging Face2023-04-10 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/vincentmin/eli5_rlhf

下载链接

链接失效反馈

官方服务：

资源简介：

--- task_categories: - conversational - text2text-generation - text-generation - question-answering language: - en tags: - rlhf - reinforcement learning from human feedback pretty_name: >- Reddit Explain Like I am Five dataset for Reinforcement Learning from Human Feedback size_categories: - 1M<n<10M --- ELI5 paired This is a processed version of the [eli5](https://huggingface.co/datasets/eli5) dataset. The dataset was created following very closely the steps in the [stack-exchange-paired dataset](https://huggingface.co/datasets/lvwerra/stack-exchange-paired). The following steps were applied: - Create pairs (response_j, response_k) where j was rated better than k - Sample at most 10 pairs per question - Shuffle the dataset globally This dataset is designed to be used for preference learning using techniques such as Reinforcement Learning from Human Feedback. The processing notebook is in the repository as well. If you want to construct a "question" column in this data, you can either use just the "title" column, or concatenate the "title" column with the "selftext" column as follows: ``` def get_question(example): title = example["title"] selftext = example["selftext"] if selftext: if selftext[-1] not in [".", "?", "!"]: seperator = ". " else: seperator = " " question = title + seperator + selftext else: question = title example["question"] = question return example dataset = load_dataset("vincentmin/eli5_askscience_askhistorians_rlhf") dataset = dataset.map(get_question) ``` For the license, see the [eli5 dataset](https://huggingface.co/datasets/eli5) which states "The licensing status of the dataset hinges on the legal status of the Pushshift.io data which is unclear." at the time of creation of this dataset.

任务类别： - 对话式任务 - 文本到文本生成任务 - 文本生成任务 - 问答任务语言： - 英语标签： - RLHF（Reinforcement Learning from Human Feedback，人类反馈强化学习） - 人类反馈强化学习（Reinforcement Learning from Human Feedback）数据集名称：用于人类反馈强化学习的Reddit像我五岁般解释数据集样本量区间：100万 < 样本量 < 1000万 --- ELI5配对版数据集本数据集是[eli5](https://huggingface.co/datasets/eli5)数据集的处理后版本，其构建严格遵循[配对版Stack Exchange数据集](https://huggingface.co/datasets/lvwerra/stack-exchange-paired)中的流程，具体处理步骤如下： - 构建(response_j, response_k)样本对，其中response_j的人类评分优于response_k - 为每个问题最多采样10组样本对 - 对全数据集执行全局洗牌操作本数据集专为基于人类反馈强化学习（Reinforcement Learning from Human Feedback，RLHF）等技术的偏好学习任务设计，处理流程的Jupyter Notebook也已同步上传至代码仓库。若需在该数据集中新增`question`列，可通过以下两种方式实现：仅使用`title`列，或将`title`列与`selftext`列进行拼接，具体代码如下： def get_question(example): title = example["title"] selftext = example["selftext"] if selftext: if selftext[-1] not in [".", "?", "!"]: seperator = ". " else: seperator = " " question = title + seperator + selftext else: question = title example["question"] = question return example dataset = load_dataset("vincentmin/eli5_askscience_askhistorians_rlhf") dataset = dataset.map(get_question) 关于授权许可，请参阅[eli5数据集](https://huggingface.co/datasets/eli5)，该数据集在本数据集创建时声明："本数据集的授权状态取决于Pushshift.io数据的法律地位，而该数据的法律地位尚不明确。"

提供机构：

vincentmin

原始信息汇总

数据集概述

基本信息

任务类别：对话式、文本到文本生成、文本生成、问答
语言：英语
标签：RLHF（从人类反馈的强化学习）
美观名称：Reddit Explain Like I am Five 数据集，用于从人类反馈的强化学习
大小类别：1M<n<10M

数据集描述

数据来源：该数据集是对 eli5 数据集的处理版本，遵循 stack-exchange-paired 数据集的创建步骤。
处理步骤：
- 创建成对数据（response_j, response_k），其中 j 的评分优于 k。
- 每个问题最多采样10对。
- 全局打乱数据集。
用途：设计用于偏好学习，特别是使用从人类反馈的强化学习技术。

数据集构造

问题构造：可以通过以下方式构造“问题”列：
- 仅使用“标题”列。
- 将“标题”列与“selftext”列拼接，具体方法如下： python def get_question(example): title = example["title"] selftext = example["selftext"] if selftext: if selftext[-1] not in [".", "?", "!"]: seperator = ". " else: seperator = " " question = title + seperator + selftext else: question = title example["question"] = question return example

许可证信息

许可证状态：数据集的许可证状态取决于 Pushshift.io 数据的法律状态，目前尚不明确。

搜集汇总

数据集介绍

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集