five

lightblue/mitsu_tophalf_borda

收藏
Hugging Face2024-05-30 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/lightblue/mitsu_tophalf_borda
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: prompt dtype: string - name: chosen list: - name: content dtype: string - name: role dtype: string - name: rejected list: - name: content dtype: string - name: role dtype: string splits: - name: test num_bytes: 216048.79407407407 num_examples: 68 - name: train num_bytes: 4289220 num_examples: 1350 download_size: 4766811 dataset_size: 4505268.794074073 configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* license: cc-by-nc-4.0 --- # Mitsu <p align="center"> <img width=400 src="https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/Ypd0x0ZyVCJs7rkd5xA_O.png" alt="Mitsu - a honey bee in its comb"/> </p> [[Paper]](https://arxiv.org/abs/2405.18952) [[Model]](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual-orpo-borda-half) This is a multilingual preference dataset generated using human written prompts and responses from 7 LLMs. We evaluate each set of responses 5 times using GPT4. Note that this model has a non-commerical license as we used the Command R and Command R+ models to create this data. We are currently working on a developing a commerically usable model, so stay tuned for that! # Dataset details This is the ORPO training dataset derived from the [lightblue/mitsu](https://huggingface.co/datasets/lightblue/mitsu). This dataset contains the prompts corresponding to the 50\% most consistely ranked responses by GPT, with the highest/lowest ranked responses used as the positive and negative responses for each prompt. # How we made this: We made this dataset using our Repeated Ranking method, which entails the following steps: 1. Sample responses from [lightblue/tagengo-gpt4](https://huggingface.co/datasets/lightblue/tagengo-gpt4), stratifying by language by sampling 100 per language 2. Generate responses for each prompt using each of the following models: * gpt-35-turbo-instruct (0914) * gpt-4 (0125-Preview) * Nexusflow/Starling-LM-7B-beta * Qwen/Qwen1.5-32B-Chat * Qwen/Qwen1.5-72B-Chat * CohereForAI/c4ai-command-r-v01 * CohereForAI/c4ai-command-r-plus 3. Evaluate the responses using gpt-4 (0125-Preview) 5 times, randomly shuffling the order that the responses are given in each time 4. Calculate the agreement between the rankings using Kendall's W The full code for creating this dataset can be [found on our repo](https://github.com/lightblue-tech/suzume/tree/main/mitsu/data_creation). # How to use it: We process this dataset into datasets usable for DPO/PPO/ORPO training using the [code available on our repo](https://github.com/lightblue-tech/suzume/blob/main/mitsu/data_creation/response_rank_process.ipynb). Processed versions of this dataset can be found at: * [All prompt dataset](https://huggingface.co/datasets/lightblue/mitsu_full_borda) * [Prompts with top 75% most repeated consistent evaluations](https://huggingface.co/datasets/lightblue/mitsu_top75_borda) * [Prompts with top 50% most repeated consistent evaluations (recommended for training)](https://huggingface.co/datasets/lightblue/mitsu_tophalf_borda) * [Prompts with top 25% most repeated consistent evaluations](https://huggingface.co/datasets/lightblue/mitsu_top25_borda) # Dataset results We conducted experiments by training our [lightblue/suzume-llama-3-8B-multilingual](https://huggingface.co/lightblue/suzume-llama-3-8B-multilingual) using this dataset with ORPO training. We also conduct experiments where we sample varying fractions of the dataset, ordered by the consistency of the 5 rankings that the evaluator model gave (as described in the diagram below). <p align="center"> <img width=800 src="https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/Ccz6V2G7zCmfZWXuHK0x3.png" alt="Diagram describing our repeated ranking methodology"/> </p> We train using the top 75%, 50%, and 25% most consistently ranked responses, and compare that to training on all responses. We find that training on less data can actually result in greater down stream accuracy for down-stream tasks, such as the MT-Bench scores in 6 languages that we test on: <p align="center"> <img width=700 src="https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/JahHDC6xcgbz3Ej2ZrWjQ.png" alt="MT-Bench results for our ORPO experiments"/> </p> # How to cite ```tex @article{devine2024sure, title={Are You Sure? Rank Them Again: Repeated Ranking For Better Preference Datasets}, author={Devine, Peter}, journal={arXiv preprint arXiv:2405.18952}, year={2024} } ``` # Developer Peter Devine - ([ptrdvn](https://huggingface.co/ptrdvn))
提供机构:
lightblue
原始信息汇总

数据集概述

数据集信息

  • 特征(Features)

    • prompt:字符串类型
    • chosen:列表类型,包含
      • content:字符串类型
      • role:字符串类型
    • rejected:列表类型,包含
      • content:字符串类型
      • role:字符串类型
  • 分割(Splits)

    • test:包含68个示例,总字节数为216048.79407407407
    • train:包含1350个示例,总字节数为4289220
  • 下载大小(Download Size):4766811字节

  • 数据集大小(Dataset Size):4505268.794074073字节

配置(Configs)

  • 默认配置(config_name: default)
    • 训练数据文件(split: train):路径为data/train-*
    • 测试数据文件(split: test):路径为data/test-*

许可证(License)

  • cc-by-nc-4.0

数据集创建方法

  • 使用Repeated Ranking方法
    1. lightblue/tagengo-gpt4数据集中按语言抽样100个响应
    2. 使用多种模型生成响应
    3. 使用gpt-4 (0125-Preview)评估响应,每次随机排序
    4. 使用Kendalls W计算排名一致性

数据集使用

实验结果

  • 使用lightblue/suzume-llama-3-8B-multilingual进行ORPO训练
    • 比较了不同比例的响应训练效果
    • 发现使用较少但更一致的数据可以提高下游任务的准确性

引用信息

tex @article{devine2024sure, title={Are You Sure? Rank Them Again: Repeated Ranking For Better Preference Datasets}, author={Devine, Peter}, journal={arXiv preprint arXiv:2405.18952}, year={2024} }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作