RLHF-And-Friends/SFT-vs-BaseGRPO

Name: RLHF-And-Friends/SFT-vs-BaseGRPO
Creator: RLHF-And-Friends
Published: 2025-03-21 11:57:05
License: 暂无描述

Hugging Face2025-03-21 更新2025-04-26 收录

下载链接：

https://hf-mirror.com/datasets/RLHF-And-Friends/SFT-vs-BaseGRPO

下载链接

链接失效反馈

官方服务：

资源简介：

这个数据集包含了两个模型对给定提示的响应。其中一列是给两个模型的提示，另外两列分别是各自模型的响应。使用的模型有：左边的TLDR-Mistral-7B-SFT，右边的TLDR-Mistral-7B-Base-GRPO。原始数据集的来源是RLHF-And-Friends/tldr-ppo。基于gpt-4o-mini的观点，TLDR-Mistral-7B-Base-GRPO对TLDR-Mistral-7B-SFT的胜率为0.52。

This dataset contains responses from two models to a given prompt. One column includes the prompt given to both models, and the other two columns include the responses from each model respectively. The models used are TLDR-Mistral-7B-SFT on the left and TLDR-Mistral-7B-Base-GRPO on the right. The original dataset is sourced from RLHF-And-Friends/tldr-ppo. According to the opinion of gpt-4o-mini, the winrate of TLDR-Mistral-7B-Base-GRPO over TLDR-Mistral-7B-SFT is 0.52.

提供机构：

RLHF-And-Friends

5,000+

优质数据集

54 个

任务类型

进入经典数据集