RLHF-And-Friends/SFT-vs-Shapa-CoPPO-2x

Name: RLHF-And-Friends/SFT-vs-Shapa-CoPPO-2x
Creator: RLHF-And-Friends
Published: 2025-03-21 12:38:35
License: 暂无描述

Hugging Face2025-03-21 更新2025-04-26 收录

下载链接：

https://hf-mirror.com/datasets/RLHF-And-Friends/SFT-vs-Shapa-CoPPO-2x

下载链接

链接失效反馈

官方服务：

资源简介：

这个数据集包含两个模型针对给定提示的回答。数据集中包含一个名为"prompt"的列，其中包含提供给两个模型的提示。另外两个列分别包含各自模型的回答。使用的模型分别是：左边的TLDR-Mistral-7B-SFT，右边的ppo-2x-mistral-7b-smallsft-tldr。原始数据集包含提示，可以在RLHF-And-Friends/tldr-ppo中找到。基于gpt-4o-mini的意见，ppo-2x-mistral-7b-smallsft-tldr在胜率上优于TLDR-Mistral-7B-SFT，胜率为0.54。

This dataset contains responses from two models given a prompt. The dataset includes a column named "prompt" which contains the prompts given to both models. The other two columns contain the responses from each respective model. The models used are TLDR-Mistral-7B-SFT on the left and ppo-2x-mistral-7b-smallsft-tldr on the right. The original dataset with prompts can be found at RLHF-And-Friends/tldr-ppo. According to the opinion of gpt-4o-mini, ppo-2x-mistral-7b-smallsft-tldr has a winrate of 0.54 over TLDR-Mistral-7B-SFT.

提供机构：

RLHF-And-Friends

5,000+

优质数据集

54 个

任务类型

进入经典数据集