webgpt_comparisons

Name: webgpt_comparisons
Creator: maas
Published: 2025-12-03 17:25:52
License: 暂无描述

魔搭社区2025-12-03 更新2025-01-11 收录

下载链接：

https://modelscope.cn/datasets/openai-mirror/webgpt_comparisons

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for WebGPT Comparisons ## Dataset Description In the [WebGPT paper](https://arxiv.org/abs/2112.09332), the authors trained a reward model from human feedback. They used the reward model to train a long form question answering model to align with human preferences. This is the dataset of all comparisons that were marked as suitable for reward modeling by the end of the WebGPT project. There are 19,578 comparisons in total. Each example in the dataset contains a pair of model answers for a question, and the associated metadata. Each answer has a preference score from humans that can be used to determine which of the two answers are better. Overall, an example has the following fields: * `question`: The text of the question, together with the name of the dataset from which it was taken and a unique ID. * `quotes_0`: The extracts that the model found while browsing for `answer_0`, together with the title of the page on which the extract was found, constructed from the HTML title and domain name of the page. * `answer_0`: The final answer that the model composed using `quotes_0`. * `tokens_0`: The prefix that would have been given to the model in the final step of the episode to create `answer_0`, and the completion given by the model or human. The prefix is made up of the question and the quotes, with some truncation, and the completion is simply the answer. Both are tokenized using the GPT-2 tokenizer. The concatenation of the prefix and completion is the input used for reward modeling. * `score_0`: The strength of the preference for `answer_0` over `answer_1` as a number from −1 to 1. It sums to 0 with `score_1`, and an answer is preferred if and only if its score is positive. For reward modeling, we treat scores of 0 as soft 50% labels, and all other scores as hard labels (using only their sign). * `quotes_1`: The counterpart to `quotes_0`. * `answer_1`: The counterpart to `answer_0`. * `tokens_1`: The counterpart to `tokens_0`. * `score_1`: The counterpart to `score_0`. This information was found in Appendix K of the WebGPT paper. ## Citation Information [https://arxiv.org/abs/2112.09332](https://arxiv.org/abs/2112.09332) ``` @inproceedings{nakano2021webgpt, author = {Reiichiro Nakano and Jacob Hilton and Suchir Balaji and Jeff Wu and Long Ouyang and Christina Kim and Christopher Hesse and Shantanu Jain and Vineet Kosaraju and William Saunders and Xu Jiang and Karl Cobbe and Tyna Eloundou and Gretchen Krueger and Kevin Button and Matthew Knight and Benjamin Chess and John Schulman}, title = {WebGPT: Browser-assisted question-answering with human feedback}, booktitle = {arXiv}, year = 2021, } ``` Dataset added to the Hugging Face Hub by [@Tristan](https://huggingface.co/Tristan) and [@natolambert](https://huggingface.co/natolambert)

# WebGPT 对比数据集卡片 ## 数据集说明在[WebGPT论文](https://arxiv.org/abs/2112.09332)中，作者基于人类反馈训练了一款奖励模型（reward model）。该团队使用该奖励模型训练长文本问答模型，使其对齐人类偏好。本数据集收录了WebGPT项目收尾阶段标记为适用于奖励模型训练的全部对比样本，总计包含19578条对比样本。数据集中的每条样本均包含针对某一问题的两条模型回复，以及相关元数据。每条回复均带有人类标注的偏好分数，可用于判定两条回复中哪一条更优。总体而言，每条样本包含以下字段： * `question`：问题文本，以及该问题所属的数据集名称与唯一标识符。 * `quotes_0`：模型在检索生成`answer_0`时抓取的文本片段，以及该片段所在页面的标题（由页面HTML标题与域名组合生成）。 * `answer_0`：模型基于`quotes_0`生成的最终回复。 * `tokens_0`：在生成`answer_0`的最终阶段输入给模型的前缀文本，以及模型或人类生成的补全文本。前缀由问题与文本片段经过部分截断后组合而成，补全文本即为回复内容。二者均使用GPT-2分词器（GPT-2 tokenizer）进行分词，前缀与补全文本的拼接结果即为奖励模型训练所用的输入。 * `score_0`：衡量`answer_0`相较于`answer_1`的偏好强度，取值范围为-1到1。该分数与`score_1`之和为0，当且仅当某条回复的分数为正时，该回复更受偏好。在奖励模型训练中，我们将分数为0的样本视为软标签（表示50%的偏好概率），其余分数则根据符号转换为硬标签（仅使用其符号信息）。 * `quotes_1`：与`quotes_0`对应的字段。 * `answer_1`：与`answer_0`对应的字段。 * `tokens_1`：与`tokens_0`对应的字段。 * `score_1`：与`score_0`对应的字段。以上信息来自WebGPT论文的附录K。 ## 引用信息 [https://arxiv.org/abs/2112.09332](https://arxiv.org/abs/2112.09332) @inproceedings{nakano2021webgpt, author = {Reiichiro Nakano and Jacob Hilton and Suchir Balaji and Jeff Wu and Long Ouyang and Christina Kim and Christopher Hesse and Shantanu Jain and Vineet Kosaraju and William Saunders and Xu Jiang and Karl Cobbe and Tyna Eloundou and Gretchen Krueger and Kevin Button and Matthew Knight and Benjamin Chess and John Schulman}, title = {WebGPT: Browser-assisted question-answering with human feedback}, booktitle = {arXiv}, year = 2021, } 本数据集由[@Tristan](https://huggingface.co/Tristan)与[@natolambert](https://huggingface.co/natolambert)上传至Hugging Face Hub。

提供机构：

maas

创建时间：

2025-01-08

搜集汇总

数据集介绍

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集