Columbia-NLP/DPO-hh-rlhf

Name: Columbia-NLP/DPO-hh-rlhf
Creator: Columbia-NLP
Published: 2024-07-10 16:09:17
License: 暂无描述

Hugging Face2024-07-10 更新2024-07-06 收录

下载链接：

https://hf-mirror.com/datasets/Columbia-NLP/DPO-hh-rlhf

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是从Anthropic/hh-rlhf数据集重新格式化而来，用于训练LION-series模型。重新格式化的过程包括为所有chosen和rejected响应设置了固定的分数占位符（chosen为10，rejected为1）。该数据集包含prompt、prompt_id、chosen、rejected、messages等字段，并分为train和test两个分区。

This dataset is reformatted from the Anthropic/hh-rlhf dataset, used to support the training of the LION series models. The dataset includes multiple fields such as prompt, prompt_id, chosen, rejected, messages, score_chosen, score_rejected, and other_info, each with specific data types and structures. The dataset is divided into training and test sets, containing different numbers of samples and bytes.

提供机构：

Columbia-NLP

原始信息汇总

数据集概述

数据集信息

特征

prompt: 字符串类型
prompt_id: 字符串类型
chosen: 列表类型
- content: 字符串类型
- role: 字符串类型
rejected: 列表类型
- content: 字符串类型
- role: 字符串类型
messages: 列表类型
- content: 字符串类型
- role: 字符串类型
score_chosen: 浮点数类型
score_rejected: 浮点数类型
other_info: 结构体类型
- source: 字符串类型

数据分割

train:
- 字节数: 501881444
- 样本数: 160800
test:
- 字节数: 26966851
- 样本数: 8552

数据集大小

下载大小: 295013949
数据集大小: 528848295

配置

config_name: default
- data_files:
  - train: data/train-*
  - test: data/test-*

数据集描述

该数据集是从Anthropic/hh-rlhf数据集重新格式化而来。
由于原始数据集没有评分，所有选定的响应评分设为“10”，所有被拒绝的响应评分设为“1”作为占位符。

5,000+

优质数据集

54 个

任务类型

进入经典数据集