lightblue/mitsu_top25_borda

Name: lightblue/mitsu_top25_borda
Creator: lightblue
Published: 2024-05-30 06:46:03
License: 暂无描述

Hugging Face2024-05-30 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/lightblue/mitsu_top25_borda

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: prompt dtype: string - name: chosen list: - name: content dtype: string - name: role dtype: string - name: rejected list: - name: content dtype: string - name: role dtype: string splits: - name: train num_bytes: 2025936 num_examples: 674 download_size: 1061721 dataset_size: 2025936 configs: - config_name: default data_files: - split: train path: data/train-* license: cc-by-nc-4.0 --- # Mitsu <img width=400 src="https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/Ypd0x0ZyVCJs7rkd5xA_O.png" alt="Mitsu - a honey bee in its comb"/> [[Paper]](https://arxiv.org/abs/2405.18952) [[Model]](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual-orpo-borda-half) This is a multilingual preference dataset generated using human written prompts and responses from 7 LLMs. We evaluate each set of responses 5 times using GPT4. Note that this model has a non-commerical license as we used the Command R and Command R+ models to create this data. We are currently working on a developing a commerically usable model, so stay tuned for that! # Dataset details This is the ORPO training dataset derived from the [lightblue/mitsu](https://huggingface.co/datasets/lightblue/mitsu). This dataset contains the prompts corresponding to the 25\% most consistely ranked responses by GPT, with the highest/lowest ranked responses used as the positive and negative responses for each prompt. # How we made this: We made this dataset using our Repeated Ranking method, which entails the following steps: 1. Sample responses from [lightblue/tagengo-gpt4](https://huggingface.co/datasets/lightblue/tagengo-gpt4), stratifying by language by sampling 100 per language 2. Generate responses for each prompt using each of the following models: * gpt-35-turbo-instruct (0914) * gpt-4 (0125-Preview) * Nexusflow/Starling-LM-7B-beta * Qwen/Qwen1.5-32B-Chat * Qwen/Qwen1.5-72B-Chat * CohereForAI/c4ai-command-r-v01 * CohereForAI/c4ai-command-r-plus 3. Evaluate the responses using gpt-4 (0125-Preview) 5 times, randomly shuffling the order that the responses are given in each time 4. Calculate the agreement between the rankings using Kendall's W The full code for creating this dataset can be [found on our repo](https://github.com/lightblue-tech/suzume/tree/main/mitsu/data_creation). # How to use it: We process this dataset into datasets usable for DPO/PPO/ORPO training using the [code available on our repo](https://github.com/lightblue-tech/suzume/blob/main/mitsu/data_creation/response_rank_process.ipynb). Processed versions of this dataset can be found at: * [All prompt dataset](https://huggingface.co/datasets/lightblue/mitsu_full_borda) * [Prompts with top 75% most repeated consistent evaluations](https://huggingface.co/datasets/lightblue/mitsu_top75_borda) * [Prompts with top 50% most repeated consistent evaluations (recommended for training)](https://huggingface.co/datasets/lightblue/mitsu_tophalf_borda) * [Prompts with top 25% most repeated consistent evaluations](https://huggingface.co/datasets/lightblue/mitsu_top25_borda) # Dataset results We conducted experiments by training our [lightblue/suzume-llama-3-8B-multilingual](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual) using this dataset with ORPO training. We also conduct experiments where we sample varying fractions of the dataset, ordered by the consistency of the 5 rankings that the evaluator model gave (as described in the diagram below). <img width=800 src="https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/Ccz6V2G7zCmfZWXuHK0x3.png" alt="Diagram describing our repeated ranking methodology"/> We train using the top 75%, 50%, and 25% most consistently ranked responses, and compare that to training on all responses. We find that training on less data can actually result in greater down stream accuracy for down-stream tasks, such as the MT-Bench scores in 6 languages that we test on: <img width=700 src="https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/JahHDC6xcgbz3Ej2ZrWjQ.png" alt="MT-Bench results for our ORPO experiments"/> # How to cite ```tex @article{devine2024sure, title={Are You Sure? Rank Them Again: Repeated Ranking For Better Preference Datasets}, author={Devine, Peter}, journal={arXiv preprint arXiv:2405.18952}, year={2024} } ``` # Developer Peter Devine - ([ptrdvn](https://huggingface.co/ptrdvn))

提供机构：

lightblue

原始信息汇总

数据集概述

数据集特征

prompt: 数据类型为字符串。
chosen: 包含两个子特征
- content: 数据类型为字符串。
- role: 数据类型为字符串。
rejected: 包含两个子特征
- content: 数据类型为字符串。
- role: 数据类型为字符串。

数据集分割

train: 包含674个样本，数据集大小为2025936字节。

数据集大小与下载大小

dataset_size: 2025936字节。
download_size: 1061721字节。

许可证

license: cc-by-nc-4.0

数据集创建方法

从lightblue/tagengo-gpt4采样响应，按语言分层采样100个每种语言。
使用以下模型为每个提示生成响应：
- gpt-35-turbo-instruct (0914)
- gpt-4 (0125-Preview)
- Nexusflow/Starling-LM-7B-beta
- Qwen/Qwen1.5-32B-Chat
- Qwen/Qwen1.5-72B-Chat
- CohereForAI/c4ai-command-r-v01
- CohereForAI/c4ai-command-r-plus
使用gpt-4 (0125-Preview)评估响应5次，每次随机打乱响应的顺序。
使用Kendalls W计算排名之间的一致性。

数据集使用

数据集用于DPO/PPO/ORPO训练，处理代码可从此处获取。

处理后的数据集版本

实验结果

使用lightblue/suzume-llama-3-8B-multilingual进行ORPO训练。
实验表明，使用较少的数据（如75%、50%、25%最一致的排名响应）进行训练，可以提高下游任务的准确性。

5,000+

优质数据集

54 个

任务类型

进入经典数据集