RLHF-And-Friends/SFT-vs-Shapa-CoPPO-4x

Name: RLHF-And-Friends/SFT-vs-Shapa-CoPPO-4x
Creator: RLHF-And-Friends
Published: 2025-03-21 12:57:06
License: 暂无描述

Hugging Face2025-03-21 更新2025-04-26 收录

下载链接：

https://hf-mirror.com/datasets/RLHF-And-Friends/SFT-vs-Shapa-CoPPO-4x

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含两个模型对给定提示的响应。prompt列包含提供给两个模型的提示，另外两列分别包含相应模型的响应。使用的模型为：左侧是TLDR-Mistral-7B-SFT，右侧是ppo-4x-mistral-7b-smallsft-tldr。原始数据集为RLHF-And-Friends/tldr-ppo。根据gpt-4o-mini的观点，ppo-4x-mistral-7b-smallsft-tldr在胜率上领先于TLDR-Mistral-7B-SFT，胜率为0.31。

This dataset contains responses from two models to a given prompt. The prompt column includes the prompt given to both models, and the other two columns contain the responses from the respective models. The models used are: Left - TLDR-Mistral-7B-SFT, Right - ppo-4x-mistral-7b-smallsft-tldr. The original dataset is RLHF-And-Friends/tldr-ppo. According to the opinion of gpt-4o-mini, ppo-4x-mistral-7b-smallsft-tldr has a winrate of 0.31 over TLDR-Mistral-7B-SFT.

提供机构：

RLHF-And-Friends

5,000+

优质数据集

54 个

任务类型

进入经典数据集