five

HongchengGao/Nectar_binarized

收藏
Hugging Face2023-12-09 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/HongchengGao/Nectar_binarized
下载链接
链接失效反馈
官方服务:
资源简介:
## Data Description This is a pre-processed version of the [Nectar](https://huggingface.co/datasets/berkeley-nest/Nectar) dataset and was processed like [ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) which was used to train Zephyr-7Β-β, a state of the art chat model at the 7B parameter scale. This dataset can be easily used with [alignment-handbook](https://github.com/huggingface/alignment-handbook/tree/main) to do **DPO** process for your models using [Nectar](https://huggingface.co/datasets/berkeley-nest/Nectar) dataset. The original [Nectar](https://huggingface.co/datasets/berkeley-nest/Nectar) dataset consists of 183k prompts, along with high-quality and diverse responses, and accurate ranking labels. We use the rank1 response as "chosen" while random select 1 response from rank2~7 as "rejected". ## Citation If you find this dataset is useful in your work, please cite the original Nectar dataset: https://huggingface.co/datasets/berkeley-nest/Nectar You may also wish to cite our repo: <pre><code>@misc{gao2023nectarb, title = {Nectar_binarized}, url = {https://huggingface.co/datasets/HongchengGao/Nectar_binarized/blob/main/README.md}, author = {Hongcheng Gao}, month = {December}, year = {2023} } </code></pre>
提供机构:
HongchengGao
原始信息汇总

数据集描述

这是一个预处理版本的 Nectar 数据集,其处理方式类似于 ultrafeedback_binarized,后者用于训练 Zephyr-7Β-β,这是一个在 7B 参数规模上的先进聊天模型。

该数据集可以轻松地与 alignment-handbook 一起使用,以使用 Nectar 数据集对您的模型进行 DPO 处理。

原始的 Nectar 数据集包含 183k 个提示,以及高质量和多样化的响应,以及准确的排名标签。我们使用排名第一的响应作为“chosen”,并从排名2到7中随机选择一个响应作为“rejected”。

引用

如果您发现此数据集在您的工作中有用,请引用原始的 Nectar 数据集: https://huggingface.co/datasets/berkeley-nest/Nectar

您也可以引用我们的仓库: @misc{gao2023nectarb, title = {Nectar_binarized}, url = {https://huggingface.co/datasets/HongchengGao/Nectar_binarized/blob/main/README.md}, author = {Hongcheng Gao}, month = {December}, year = {2023} }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作