when2rl/distilabel-intel-orca-dpo-pairs_cleaned_reformatted

Name: when2rl/distilabel-intel-orca-dpo-pairs_cleaned_reformatted
Creator: when2rl
Published: 2024-04-17 00:45:15
License: 暂无描述

Hugging Face2024-04-17 更新2024-06-22 收录

下载链接：

https://hf-mirror.com/datasets/when2rl/distilabel-intel-orca-dpo-pairs_cleaned_reformatted

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是argilla的distilabel-intel-orca-dpo-pairs的清理和重新格式化版本。清理过程包括移除所有没有评分信息的样本，重新格式化数据集以与ultrafeedback_binarized保持一致，翻转了`chosen`、`rejected`和`message`列中的“system”与“assistant”角色，并移除了`chosen`与`rejected`相同的行。数据集包含多个特征，如prompt、prompt_id、chosen、rejected、messages等，并且包含一个训练集，共有12815个样本。

提供机构：

when2rl

原始信息汇总

数据集卡片 for distilabel-intel-orca-dpo-pairs_cleaned_reformatted

数据集详情

数据集描述

这是一个经过“清洗”和重新格式化的 argilla 的 distilabel-intel-orca-dpo-pairs 版本：

移除了所有 rating 为 none 的样本。共有 42 个样本没有评级信息。
重新格式化数据集，使其与 ultrafeedback_binarized 一致。
(新) 在 chosen、rejected 和 message 列中将 "system" 与 "assistant" 角色互换。原始数据集使用 "systems" 表示聊天机器人响应。
(新) 移除了所有 chosen 与 rejected 相同的行。这从训练集中移除了 2 行。

数据集结构

特征

prompt: 字符串
prompt_id: 字符串
chosen: 列表
- content: 字符串
- role: 字符串
rejected: 列表
- content: 字符串
- role: 字符串
messages: 列表
- content: 字符串
- role: 字符串
score_chosen: 浮点数 (float64)
score_rejected: 浮点数 (float64)
other_info: 结构体
- in_gsm8k_train: 布尔值
- labeling_model: 字符串
- labeling_prompt: 列表
  - content: 字符串
  - role: 字符串
- original_chosen: 字符串
- original_rejected: 字符串
- score_is_flipped: 布尔值
- system: 字符串

分割

train:
- num_bytes: 145232429.99453852
- num_examples: 12815

数据大小

download_size: 74069814
dataset_size: 145232429.99453852

配置

config_name: default
data_files:
- split: train
- path: data/train-*

5,000+

优质数据集

54 个

任务类型

进入经典数据集