five

lightblue/mitsu_top25_borda

收藏
Hugging Face2024-05-30 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/lightblue/mitsu_top25_borda
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: prompt dtype: string - name: chosen list: - name: content dtype: string - name: role dtype: string - name: rejected list: - name: content dtype: string - name: role dtype: string splits: - name: train num_bytes: 2025936 num_examples: 674 download_size: 1061721 dataset_size: 2025936 configs: - config_name: default data_files: - split: train path: data/train-* license: cc-by-nc-4.0 --- # Mitsu <p align="center"> <img width=400 src="https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/Ypd0x0ZyVCJs7rkd5xA_O.png" alt="Mitsu - a honey bee in its comb"/> </p> [[Paper]](https://arxiv.org/abs/2405.18952) [[Model]](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual-orpo-borda-half) This is a multilingual preference dataset generated using human written prompts and responses from 7 LLMs. We evaluate each set of responses 5 times using GPT4. Note that this model has a non-commerical license as we used the Command R and Command R+ models to create this data. We are currently working on a developing a commerically usable model, so stay tuned for that! # Dataset details This is the ORPO training dataset derived from the [lightblue/mitsu](https://huggingface.co/datasets/lightblue/mitsu). This dataset contains the prompts corresponding to the 25\% most consistely ranked responses by GPT, with the highest/lowest ranked responses used as the positive and negative responses for each prompt. # How we made this: We made this dataset using our Repeated Ranking method, which entails the following steps: 1. Sample responses from [lightblue/tagengo-gpt4](https://huggingface.co/datasets/lightblue/tagengo-gpt4), stratifying by language by sampling 100 per language 2. Generate responses for each prompt using each of the following models: * gpt-35-turbo-instruct (0914) * gpt-4 (0125-Preview) * Nexusflow/Starling-LM-7B-beta * Qwen/Qwen1.5-32B-Chat * Qwen/Qwen1.5-72B-Chat * CohereForAI/c4ai-command-r-v01 * CohereForAI/c4ai-command-r-plus 3. Evaluate the responses using gpt-4 (0125-Preview) 5 times, randomly shuffling the order that the responses are given in each time 4. Calculate the agreement between the rankings using Kendall's W The full code for creating this dataset can be [found on our repo](https://github.com/lightblue-tech/suzume/tree/main/mitsu/data_creation). # How to use it: We process this dataset into datasets usable for DPO/PPO/ORPO training using the [code available on our repo](https://github.com/lightblue-tech/suzume/blob/main/mitsu/data_creation/response_rank_process.ipynb). Processed versions of this dataset can be found at: * [All prompt dataset](https://huggingface.co/datasets/lightblue/mitsu_full_borda) * [Prompts with top 75% most repeated consistent evaluations](https://huggingface.co/datasets/lightblue/mitsu_top75_borda) * [Prompts with top 50% most repeated consistent evaluations (recommended for training)](https://huggingface.co/datasets/lightblue/mitsu_tophalf_borda) * [Prompts with top 25% most repeated consistent evaluations](https://huggingface.co/datasets/lightblue/mitsu_top25_borda) # Dataset results We conducted experiments by training our [lightblue/suzume-llama-3-8B-multilingual](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual) using this dataset with ORPO training. We also conduct experiments where we sample varying fractions of the dataset, ordered by the consistency of the 5 rankings that the evaluator model gave (as described in the diagram below). <p align="center"> <img width=800 src="https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/Ccz6V2G7zCmfZWXuHK0x3.png" alt="Diagram describing our repeated ranking methodology"/> </p> We train using the top 75%, 50%, and 25% most consistently ranked responses, and compare that to training on all responses. We find that training on less data can actually result in greater down stream accuracy for down-stream tasks, such as the MT-Bench scores in 6 languages that we test on: <p align="center"> <img width=700 src="https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/JahHDC6xcgbz3Ej2ZrWjQ.png" alt="MT-Bench results for our ORPO experiments"/> </p> # How to cite ```tex @article{devine2024sure, title={Are You Sure? Rank Them Again: Repeated Ranking For Better Preference Datasets}, author={Devine, Peter}, journal={arXiv preprint arXiv:2405.18952}, year={2024} } ``` # Developer Peter Devine - ([ptrdvn](https://huggingface.co/ptrdvn))
提供机构:
lightblue
原始信息汇总

数据集概述

数据集特征

  • prompt: 数据类型为字符串。
  • chosen: 包含两个子特征
    • content: 数据类型为字符串。
    • role: 数据类型为字符串。
  • rejected: 包含两个子特征
    • content: 数据类型为字符串。
    • role: 数据类型为字符串。

数据集分割

  • train: 包含674个样本,数据集大小为2025936字节。

数据集大小与下载大小

  • dataset_size: 2025936字节。
  • download_size: 1061721字节。

许可证

  • license: cc-by-nc-4.0

数据集创建方法

  1. lightblue/tagengo-gpt4采样响应,按语言分层采样100个每种语言。
  2. 使用以下模型为每个提示生成响应:
    • gpt-35-turbo-instruct (0914)
    • gpt-4 (0125-Preview)
    • Nexusflow/Starling-LM-7B-beta
    • Qwen/Qwen1.5-32B-Chat
    • Qwen/Qwen1.5-72B-Chat
    • CohereForAI/c4ai-command-r-v01
    • CohereForAI/c4ai-command-r-plus
  3. 使用gpt-4 (0125-Preview)评估响应5次,每次随机打乱响应的顺序。
  4. 使用Kendalls W计算排名之间的一致性。

数据集使用

  • 数据集用于DPO/PPO/ORPO训练,处理代码可从此处获取。

处理后的数据集版本

实验结果

  • 使用lightblue/suzume-llama-3-8B-multilingual进行ORPO训练。
  • 实验表明,使用较少的数据(如75%、50%、25%最一致的排名响应)进行训练,可以提高下游任务的准确性。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作