summarize_from_feedback

Name: summarize_from_feedback
Creator: maas
Published: 2026-01-02 16:19:52
License: 暂无描述

魔搭社区2026-01-02 更新2025-01-11 收录

下载链接：

https://modelscope.cn/datasets/openai-mirror/summarize_from_feedback

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for Summarize from Feedback ## Dataset Description In the [Learning to Summarize from Human Feedback paper](https://arxiv.org/abs/2009.01325), a reward model was trained from human feedback. The reward model was then used to train a summarization model to align with human preferences. This is the dataset of human feedback that was released for reward modelling. There are two parts of this dataset: `comparisons` and `axis`. In the `comparisons` part, human annotators were asked to choose the best out of two summaries. In the `axis` part, human annotators gave scores on a likert scale for the quality of a summary. The `comparisons` part only has a train and validation split, and the `axis` part only has a test and validation split. The summaries used for training the reward model in the paper come from the TL;DR dataset. Additional validation and test data come from the TL;DR dataset, CNN articles, and Daily Mail articles. For more information, see the repo [here](https://github.com/openai/summarize-from-feedback#human-feedback-data). ## Citation Information [https://arxiv.org/abs/2009.01325](https://arxiv.org/abs/2009.01325) ``` @inproceedings{stienon2020learning, author = {Nisan Stiennon and Long Ouyang and Jeff Wu and Daniel M. Ziegler and Ryan Lowe and Chelsea Voss and Alec Radford and Dario Amodei and Paul Christiano}, title = {Learning to summarize from human feedback}, booktitle = {NeurIPS}, year = 2020, } ``` Dataset added to the Hugging Face Hub with help from [@Tristan](https://huggingface.co/Tristan)

# 基于人类反馈的摘要数据集卡片 ## 数据集描述在《基于人类反馈的摘要学习》（Learning to Summarize from Human Feedback）论文[1]中，研究人员借助人类反馈训练了一款奖励模型（reward model）。该奖励模型随后被用于微调摘要模型，使其对齐人类偏好。本数据集即为该研究中用于奖励模型训练的人类反馈数据集。该数据集包含两个子数据集：`comparisons`（对比数据集）与`axis`（评分数据集）。其中，在`comparisons`（对比数据集）部分，标注人员需从两份候选摘要中选出更优的一篇；在`axis`（评分数据集）部分，标注人员需基于李克特量表（Likert scale）对单份摘要的质量进行打分。 `comparisons`（对比数据集）仅包含训练集与验证集划分，而`axis`（评分数据集）仅包含测试集与验证集划分。该论文中用于训练奖励模型的摘要样本均取自TL;DR数据集。额外的验证集与测试集样本则取自TL;DR数据集、CNN新闻稿件与每日邮报（Daily Mail）新闻稿件。如需获取更多信息，请访问该项目仓库：https://github.com/openai/summarize-from-feedback#human-feedback-data ## 引用信息 [1] https://arxiv.org/abs/2009.01325 @inproceedings{stienon2020learning, author = {尼桑·斯蒂农（Nisan Stiennon）、龙·欧杨（Long Ouyang）、杰夫·吴（Jeff Wu）、丹尼尔·M·齐格勒（Daniel M. Ziegler）、瑞安·洛韦（Ryan Lowe）、切尔西·沃斯（Chelsea Voss）、亚历克·雷德福德（Alec Radford）、达里奥·阿莫迪（Dario Amodei）、保罗·克里斯蒂亚诺（Paul Christiano)}, title = {基于人类反馈的摘要学习（Learning to summarize from human feedback)}, booktitle = {NeurIPS}, year = 2020, } 本数据集由[@Tristan](https://huggingface.co/Tristan)协助上传至Hugging Face Hub平台。

提供机构：

maas

创建时间：

2025-01-08

搜集汇总

数据集介绍