Columbia-NLP/DPO-distilabel-capybara-dpo-7k-binarized

Name: Columbia-NLP/DPO-distilabel-capybara-dpo-7k-binarized
Creator: Columbia-NLP
Published: 2024-07-10 16:05:35
License: 暂无描述

Hugging Face2024-07-10 更新2024-07-06 收录

下载链接：

https://hf-mirror.com/datasets/Columbia-NLP/DPO-distilabel-capybara-dpo-7k-binarized

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是从argilla/distilabel-capybara-dpo-7k-binarized重新格式化而来，以适应DPO数据集集合的通用格式。具体修改包括将所有评分从[1,5]调整为[1,10]，并删除所有chosen和rejected相同的行。此数据集用于训练语言模型，特别是在SFT、DPO和在线偏好学习（在线DPO）三个阶段中使用。通过使用公开可用的数据集，确保了模型的可重复性。

This dataset is reformatted from argilla/distilabel-capybara-dpo-7k-binarized to fit a common format used across all DPO datasets in this collection. Specific modifications include adjusting all ratings from [1,5] to [1,10] and removing all rows where the chosen is the same as rejected. The dataset is used for training language models, particularly in the three stages of SFT, DPO, and online preference learning (online DPO). By using publicly available datasets, the reproducibility of the models is ensured.

提供机构：

Columbia-NLP

原始信息汇总

数据集概述

数据集信息

特征

prompt: 字符串类型
prompt_id: 字符串类型
chosen: 列表类型
- content: 字符串类型
- role: 字符串类型
rejected: 列表类型
- content: 字符串类型
- role: 字符串类型
messages: 列表类型
- content: 字符串类型
- role: 字符串类型
score_chosen: 浮点数类型
score_rejected: 浮点数类型
other_info: 结构体类型
- chosen-model: 字符串类型
- generation_prompt: 字符串序列类型
- new_generations: 字符串序列类型
- original_response: 字符串类型
- rejected-model: 字符串类型
- source: 字符串类型

数据分割

train:
- 字节数: 273652414
- 样本数: 7562

数据集大小

下载大小: 116692036
数据集大小: 273652414

配置

config_name: default
- data_files:
  - split: train
  - path: data/train-*

数据集描述

该数据集是从 argilla/distilabel-capybara-dpo-7k-binarized 数据集重新格式化而来，具体修改如下：

将所有 rating 改为 rating * 2，以与其他DPO数据集的评分范围 [1,10] 保持一致。
移除了所有 chosen 和 rejected 相同的行。

5,000+

优质数据集

54 个

任务类型

进入经典数据集