RLHF-And-Friends/SFT-vs-Shapa-CoPPO-8x

Name: RLHF-And-Friends/SFT-vs-Shapa-CoPPO-8x
Creator: RLHF-And-Friends
Published: 2025-03-21 13:22:06
License: 暂无描述

Hugging Face2025-03-21 更新2025-04-26 收录

下载链接：

https://hf-mirror.com/datasets/RLHF-And-Friends/SFT-vs-Shapa-CoPPO-8x

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含两个模型对于给定提示的回答。数据集包括一个名为prompt的列，其中包含提供给两个模型的提示。另外两个列包含相应模型的回答。使用的模型分别是左边的TLDR-Mistral-7B-SFT和右边的ppo-8x-mistral-7b-smallsft-tldr。原始数据集来源于RLHF-And-Friends/tldr-ppo。ppo-8x-mistral-7b-smallsft-tldr在gpt-4o-mini意见的基础上对TLDR-Mistral-7B-SFT的胜率为0.78。

This dataset contains responses of two models given prompt. The column prompt includes the prompt given to both models. The other two columns contain responses from the respective models. The models used are TLDR-Mistral-7B-SFT on the left and ppo-8x-mistral-7b-smallsft-tldr on the right. The original dataset with prompts comes from RLHF-And-Friends/tldr-ppo. ppo-8x-mistral-7b-smallsft-tldr has a winrate of 0.78 over TLDR-Mistral-7B-SFT based on the opinion of gpt-4o-mini.

提供机构：

RLHF-And-Friends

5,000+

优质数据集

54 个

任务类型

进入经典数据集