argilla/distilabel-capybara-kto-15k-binarized
收藏Hugging Face2024-03-19 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/argilla/distilabel-capybara-kto-15k-binarized
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: apache-2.0
size_categories:
- 1K<n<10K
task_categories:
- conversational
- question-answering
- text-generation
pretty_name: CapybaraDPO-7k
tags:
- Physics
- Biology
- Math
- Chemistry
- Culture
- Logic
- Roleplay
- rlaif
- rlhf
- kto
- distilabel
- synthetic
dataset_info:
features:
- name: prompt
dtype: string
- name: completion
list:
- name: content
dtype: string
- name: role
dtype: string
- name: label
dtype: bool
- name: rating
dtype: int64
- name: model
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 129692808
num_examples: 15126
download_size: 42545061
dataset_size: 129692808
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
# Capybara-KTO 15K binarized
> A KTO signal transformed version of the highly loved [Capybara-DPO 7K binarized](https://huggingface.co/datasets/argilla/distilabel-capybara-dpo-7k-binarized), A DPO dataset built with [distilabel](https://github.com/argilla-io/distilabel) atop the awesome [LDJnr/Capybara](https://huggingface.co/datasets/LDJnr/Capybara)
> This is a preview version to collect feedback from the community. v2 will include the full base dataset and responses from more powerful models.
<div>
<img src="https://cdn-uploads.huggingface.co/production/uploads/60420dccc15e823a685f2b03/Vmr0FtTvnny6Snm-UDM_n.png">
</div>
<p align="center">
<a href="https://github.com/argilla-io/distilabel">
<img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-light.png" alt="Built with Distilabel" width="200" height="32"/>
</a>
</p>
## Why KTO?
The [KTO paper](https://arxiv.org/abs/2402.01306) states:
- KTO matches or exceeds DPO performance at scales from 1B to 30B parameters.1 That is, taking a preference dataset of n DPO pairs and breaking it up into 2n examples for KTO can yield better generations, despite the model ostensibly learning from a weaker signal.
- KTO can handle extreme data imbalances, matching DPO performance while using up to 90% fewer desirable examples (i.e., examples of good generations). Its success thus cannot be ascribed to the alignment data being sourced from a preference dataset.
- When the pretrained model is sufficiently good, one can skip supervised finetuning and go straight to KTO without a loss in generation quality. In contrast, we find that without doing SFT first, DPO-aligned models are significantly worse at all scales.
## Reproduce KTO Transformation
Original [distilabel Capybara-DPO 7K binarized](https://huggingface.co/datasets/argilla/distilabel-capybara-dpo-7k-binarized)
<a target="_blank" href="https://colab.research.google.com/drive/1xmc2q966UrLoHwZ4g-2Wd9qKzQLF-IJm?usp=sharing">
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>
提供机构:
argilla
原始信息汇总
数据集概述
基本信息
- 语言: 英语
- 许可证: Apache 2.0
- 数据集大小: 1K<n<10K
- 任务类别: 对话、问答、文本生成
- 标签: 物理、生物、数学、化学、文化、逻辑、角色扮演、rlaif、rlhf、kto、distilabel、合成
数据集详情
- 特征:
- prompt: 字符串类型
- completion: 列表类型
- content: 字符串类型
- role: 字符串类型
- label: 布尔类型
- rating: 64位整数类型
- model: 字符串类型
- source: 字符串类型
- 分割:
- train: 15126个样本,129692808字节
- 下载大小: 42545061字节
- 数据集大小: 129692808字节
配置
- 默认配置:
- 数据文件:
- train: data/train-*
- 数据文件:



