danilopeixoto/pandora-rlhf

Name: danilopeixoto/pandora-rlhf
Creator: danilopeixoto
Published: 2024-03-01 09:32:00
License: 暂无描述

Hugging Face2024-03-01 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/danilopeixoto/pandora-rlhf

下载链接

链接失效反馈

官方服务：

资源简介：

--- pretty_name: Pandora RLHF task_categories: - text-generation size_categories: - 100K<n<1M tags: - dpo - fine-tuning - rlhf license: bsd-3-clause --- # Pandora RLHF A Reinforcement Learning from Human Feedback (RLHF) dataset for Direct Preference Optimization (DPO) fine-tuning of the Pandora Large Language Model (LLM). The dataset is based on the [anthropic/hh-rlhf](https://huggingface.co/datasets/anthropic/hh-rlhf) dataset. ## Copyright and license Copyright (c) 2024, Danilo Peixoto Ferreira. All rights reserved. Project developed under a [BSD-3-Clause license](LICENSE.md).

提供机构：

danilopeixoto

原始信息汇总

Pandora RLHF

概述

Pandora RLHF 是一个用于直接偏好优化（DPO）微调 Pandora 大型语言模型（LLM）的人类反馈强化学习（RLHF）数据集。

数据集来源

该数据集基于 anthropic/hh-rlhf 数据集。

任务类别

文本生成

数据集大小

100K<n<1M

许可证

BSD-3-Clause 许可证

5,000+

优质数据集

54 个

任务类型

进入经典数据集

danilopeixoto/pandora-rlhf

Pandora RLHF

概述

数据集来源

任务类别

数据集大小

标签

许可证