Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_2

Name: Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_2
Creator: Cornell-AGI
Published: 2024-10-08 18:06:16
License: 暂无描述

Hugging Face2024-10-08 更新2025-04-12 收录

下载链接：

https://hf-mirror.com/datasets/Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_2

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: chosen list: - name: content dtype: string - name: role dtype: string - name: reject list: - name: content dtype: string - name: role dtype: string - name: chosen_token sequence: int64 - name: reject_token sequence: int64 - name: chosen_mask sequence: int64 - name: reject_mask sequence: int64 - name: chosen_reward dtype: float64 - name: reject_reward dtype: float64 splits: - name: train num_bytes: 8521071947 num_examples: 116117 download_size: 626010383 dataset_size: 8521071947 configs: - config_name: default data_files: - split: train path: data/train-* --- This is a dataset released for our paper: [Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF](https://arxiv.org/abs/2410.04612). # REFUEL-Ultrainteract-Llama-3-Armo-iter_2 This dataset contains dialogues using [REFUEL-Llama-3-Armo-iter_1](https://huggingface.co/Cornell-AGI/REFUEL-Llama-3-Armo-iter_1) as the assistant and [Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) as the user. The dataset is used to train [REFUEL-Llama-3-Armo-iter_2](https://huggingface.co/Cornell-AGI/REFUEL-Llama-3-Armo-iter_2). The generation code is available at https://github.com/ZhaolinGao/REFUEL. ## Evaluations <table> <tr> <th rowspan="2">Method</th> <th rowspan="2">Dataset</th> <th colspan="6">Winrate at Turn</th> </tr> <tr> <th>h = 1</th> <th>h = 2</th> <th>h = 3</th> <th>h = 4</th> <th>H = 5</th> <th>avg</th> </tr> <tr> <td>Llama-3.1-70B-it</td> <td> N/A </td> <td>70.4</td> <td>66.4</td> <td>61.0</td> <td>53.0</td> <td>55.4</td> <td>61.24</td> </tr> <tr> <td><a href="https://huggingface.co/Cornell-AGI/REFUEL-Llama-3-Armo-iter_1">REFUEL-Llama-3-Armo-iter_1</a></td> <td><a href="https://huggingface.co/datasets/Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_1">REFUEL-Ultrainteract-Llama-3-Armo-iter_1</a></td> <td>54.6</td> <td>53.6</td> <td>57.8</td> <td>56.2</td> <td>59.4</td> <td>56.32</td> </tr> <tr> <td><a href="https://huggingface.co/Cornell-AGI/REFUEL-Llama-3-Armo-iter_2">REFUEL-Llama-3-Armo-iter_2</a></td> <td><a href="https://huggingface.co/datasets/Cornell-AGI/REFUEL-Ultrainteract-Llama-3-Armo-iter_2">REFUEL-Ultrainteract-Llama-3-Armo-iter_2</a></td> <td>55.2</td> <td>53.4</td> <td>58.8</td> <td>57.2</td> <td>58.6</td> <td>56.64</td> </tr> </table> ## Citation Please cite our paper if you use this dataset in your own work: ``` @misc{gao2024regressingrelativefutureefficient, title={Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF}, author={Zhaolin Gao and Wenhao Zhan and Jonathan D. Chang and Gokul Swamy and Kianté Brantley and Jason D. Lee and Wen Sun}, year={2024}, eprint={2410.04612}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2410.04612}, } ```

提供机构：

Cornell-AGI

5,000+

优质数据集

54 个

任务类型

进入经典数据集