RLHF-And-Friends/Human-vs-Shapa-8x
收藏Hugging Face2025-03-28 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/RLHF-And-Friends/Human-vs-Shapa-8x
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了来自RLHF-And-Friends/tldr-sft测试分割的人类补全和borisshapa/ppo-8x-mistral-7b-smallsft-tldr模型的补全。数据集包含一个名为prompt的列,用于存储提供给人类和模型的提示。使用的模型为ppo-8x-mistral-7b-smallsft-tldr,该模型在gpt-4o-mini和gpt-4o的评估下,相对于人类的补全有较高的胜率。
This dataset includes human completions from the RLHF-And-Friends/tldr-sft test split and completions from the borisshapa/ppo-8x-mistral-7b-smallsft-tldr model. It contains a column named prompt, which holds the prompts given to both humans and the model. The model used is ppo-8x-mistral-7b-smallsft-tldr, which has a high winrate over human completions according to the gpt-4o-mini and gpt-4o opinions.
提供机构:
RLHF-And-Friends



