summarize_from_feedback
收藏魔搭社区2026-01-02 更新2025-01-11 收录
下载链接:
https://modelscope.cn/datasets/openai-mirror/summarize_from_feedback
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for Summarize from Feedback
## Dataset Description
In the [Learning to Summarize from Human Feedback paper](https://arxiv.org/abs/2009.01325), a reward model was trained from human feedback.
The reward model was then used to train a summarization model to align with human preferences. This is the dataset of human feedback that was released for reward modelling.
There are two parts of this dataset: `comparisons` and `axis`. In the `comparisons` part, human annotators were asked to choose the best out of two summaries.
In the `axis` part, human annotators gave scores on a likert scale for the quality of a summary.
The `comparisons` part only has a train and validation split, and the `axis` part only has a test and validation split.
The summaries used for training the reward model in the paper come from the TL;DR dataset.
Additional validation and test data come from the TL;DR dataset, CNN articles, and Daily Mail articles.
For more information, see the repo [here](https://github.com/openai/summarize-from-feedback#human-feedback-data).
## Citation Information
[https://arxiv.org/abs/2009.01325](https://arxiv.org/abs/2009.01325)
```
@inproceedings{stienon2020learning,
author = {Nisan Stiennon and Long Ouyang and Jeff Wu and Daniel M. Ziegler and Ryan Lowe and Chelsea Voss and Alec Radford and Dario Amodei and Paul Christiano},
title = {Learning to summarize from human feedback},
booktitle = {NeurIPS},
year = 2020,
}
```
Dataset added to the Hugging Face Hub with help from [@Tristan](https://huggingface.co/Tristan)
# 基于人类反馈的摘要数据集卡片
## 数据集描述
在《基于人类反馈的摘要学习》(Learning to Summarize from Human Feedback)论文[1]中,研究人员借助人类反馈训练了一款奖励模型(reward model)。该奖励模型随后被用于微调摘要模型,使其对齐人类偏好。本数据集即为该研究中用于奖励模型训练的人类反馈数据集。
该数据集包含两个子数据集:`comparisons`(对比数据集)与`axis`(评分数据集)。其中,在`comparisons`(对比数据集)部分,标注人员需从两份候选摘要中选出更优的一篇;在`axis`(评分数据集)部分,标注人员需基于李克特量表(Likert scale)对单份摘要的质量进行打分。
`comparisons`(对比数据集)仅包含训练集与验证集划分,而`axis`(评分数据集)仅包含测试集与验证集划分。
该论文中用于训练奖励模型的摘要样本均取自TL;DR数据集。额外的验证集与测试集样本则取自TL;DR数据集、CNN新闻稿件与每日邮报(Daily Mail)新闻稿件。
如需获取更多信息,请访问该项目仓库:https://github.com/openai/summarize-from-feedback#human-feedback-data
## 引用信息
[1] https://arxiv.org/abs/2009.01325
@inproceedings{stienon2020learning,
author = {尼桑·斯蒂农(Nisan Stiennon)、龙·欧杨(Long Ouyang)、杰夫·吴(Jeff Wu)、丹尼尔·M·齐格勒(Daniel M. Ziegler)、瑞安·洛韦(Ryan Lowe)、切尔西·沃斯(Chelsea Voss)、亚历克·雷德福德(Alec Radford)、达里奥·阿莫迪(Dario Amodei)、保罗·克里斯蒂亚诺(Paul Christiano)},
title = {基于人类反馈的摘要学习(Learning to summarize from human feedback)},
booktitle = {NeurIPS},
year = 2020,
}
本数据集由[@Tristan](https://huggingface.co/Tristan)协助上传至Hugging Face Hub平台。
提供机构:
maas
创建时间:
2025-01-08
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集包含人类对摘要的反馈,用于训练奖励模型以优化摘要生成。数据集分为两部分:'comparisons'(人类选择最佳摘要)和'axis'(人类评分摘要质量),数据来源于TL;DR数据集、CNN和Daily Mail文章。
以上内容由遇见数据集搜集并总结生成



