danilopeixoto/pandora-rlhf
收藏Hugging Face2024-03-01 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/danilopeixoto/pandora-rlhf
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: Pandora RLHF
task_categories:
- text-generation
size_categories:
- 100K<n<1M
tags:
- dpo
- fine-tuning
- rlhf
license: bsd-3-clause
---
# Pandora RLHF
A Reinforcement Learning from Human Feedback (RLHF) dataset for Direct Preference Optimization (DPO) fine-tuning of the Pandora Large Language Model (LLM).
The dataset is based on the [anthropic/hh-rlhf](https://huggingface.co/datasets/anthropic/hh-rlhf) dataset.
## Copyright and license
Copyright (c) 2024, Danilo Peixoto Ferreira. All rights reserved.
Project developed under a [BSD-3-Clause license](LICENSE.md).
提供机构:
danilopeixoto
原始信息汇总
Pandora RLHF
概述
Pandora RLHF 是一个用于直接偏好优化(DPO)微调 Pandora 大型语言模型(LLM)的人类反馈强化学习(RLHF)数据集。
数据集来源
该数据集基于 anthropic/hh-rlhf 数据集。
任务类别
- 文本生成
数据集大小
- 100K<n<1M
标签
- dpo
- fine-tuning
- rlhf
许可证
BSD-3-Clause 许可证



