offsetbias

Name: offsetbias
Creator: maas
Published: 2025-12-05 16:43:08
License: 暂无描述

魔搭社区2025-12-05 更新2025-08-23 收录

下载链接：

https://modelscope.cn/datasets/NCSOFT/offsetbias

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for OffsetBias ## Dataset Description: - 💻 **Repository:** [https://github.com/ncsoft/offsetbias](https://github.com/ncsoft/offsetbias) - 📜 **Paper:** [OffsetBias: Leveraging Debiased Data for Tuning Evaluators](https://arxiv.org/abs/2407.06551) ### Dataset Summary **OffsetBias** is a pairwise preference dataset intended to reduce common biases inherent in judge models (language models specialized in evaluation). The dataset is introduced in paper **OffsetBias: Leveraging Debiased Data for Tuning Evaluators**. OffsetBias contains 8,504 samples for training judge (or reward) models. ## Dataset Structure ### Data Instance ```json { "instruction": "Rewrite the following sentence, paying attention to spelling and grammar. Distance learning is harder then in-person classes", "output_1": "Distance learning is harder than in-person classes.", "output_2": "The original sentence is grammatically correct and the punctuation is also appropriate. However, if you wish to slightly rephrase it for clarity or emphasis, one possibility is:\n\n\"Studying online can be more challenging than in traditional classroom settings.\" \n\nThis slight tweak emphasizes the comparison between \"studying online\" and \"studying... in traditional classroom settings.\"", "label": 1 } ``` ### Data Fields - `instruction`: a string, instruction that responses need to answer - `output_1`: a string, first candidate response to the instruction - `output_2`: a string, second candidate response to the instruction - `label`: an integer of 1 or 2, indicating the better response number ## Dataset Construction The instructions are taken from source datasets including [Alpaca](https://huggingface.co/datasets/tatsu-lab/alpaca), [Ultrachat](https://huggingface.co/datasets/stingning/ultrachat), Evol-[Instruct](https://huggingface.co/datasets/WizardLMTeam/WizardLM_evol_instruct_70k) and [Flan](https://github.com/google-research/FLAN). The outputs are created using GPT-3.5, GPT-4 and Claude-3-Opus models. See paper for details about data construction methods. ## Considerations for Using the Data This dataset is intended to be a collection of counter-examples to existing common biases in evaluation models. Thus using this dataset alone for training is not suitable for training an evaluation model, as it would introduce another bias. It is best used as an additional dataset that can complement other general-purpose evaluation datasets. ## Citation **BibTeX:** ```bibtex @misc{park2024offsetbias, title={OffsetBias: Leveraging Debiased Data for Tuning Evaluators}, author={Junsoo Park and Seungyeon Jwa and Meiying Ren and Daeyoung Kim and Sanghyuk Choi}, year={2024}, eprint={2407.06551}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```

# OffsetBias 数据集卡片 ## 数据集说明： - 💻 **代码仓库：** [https://github.com/ncsoft/offsetbias](https://github.com/ncsoft/offsetbias) - 📜 **相关论文：** [OffsetBias: 利用去偏数据微调评估器（OffsetBias: Leveraging Debiased Data for Tuning Evaluators）](https://arxiv.org/abs/2407.06551) ### 数据集概述 **OffsetBias** 是一款成对偏好数据集，旨在缓解评判模型（judge model，即专门面向评估任务的语言模型）中固有的常见偏差。该数据集出自论文《OffsetBias: 利用去偏数据微调评估器》。OffsetBias 包含8504个用于训练评判模型（或奖励模型）的样本。 ## 数据集结构 ### 数据实例 json { "instruction": "Rewrite the following sentence, paying attention to spelling and grammar. Distance learning is harder then in-person classes", "output_1": "Distance learning is harder than in-person classes.", "output_2": "The original sentence is grammatically correct and the punctuation is also appropriate. However, if you wish to slightly rephrase it for clarity or emphasis, one possibility is: "Studying online can be more challenging than in traditional classroom settings." This slight tweak emphasizes the comparison between "studying online" and "studying... in traditional classroom settings."", "label": 1 } ### 数据字段说明 - `instruction`：字符串类型，代表要求响应完成的指令 - `output_1`：字符串类型，针对该指令的第一候选响应 - `output_2`：字符串类型，针对该指令的第二候选响应 - `label`：整数类型，取值为1或2，用于标识更优质的响应序号 ## 数据集构建本数据集的指令来源于多个源数据集，包括 [Alpaca](https://huggingface.co/datasets/tatsu-lab/alpaca)、[Ultrachat](https://huggingface.co/datasets/stingning/ultrachat)、Evol-Instruct（https://huggingface.co/datasets/WizardLMTeam/WizardLM_evol_instruct_70k）以及 [Flan](https://github.com/google-research/FLAN)。数据集的响应由 GPT-3.5、GPT-4 及 Claude-3-Opus 模型生成，数据构建方法的详细细节请参阅相关论文。 ## 数据集使用注意事项本数据集旨在作为现有评估模型中常见偏差的反例集合。因此，仅使用该数据集进行训练并不适用于评估模型的训练，这可能会引入新的偏差。最佳使用方式是将其作为补充数据集，与其他通用评估数据集配合使用。 ## 引用信息 **BibTeX 格式：** bibtex @misc{park2024offsetbias, title={OffsetBias: Leveraging Debiased Data for Tuning Evaluators}, author={Junsoo Park and Seungyeon Jwa and Meiying Ren and Daeyoung Kim and Sanghyuk Choi}, year={2024}, eprint={2407.06551}, archivePrefix={arXiv}, primaryClass={cs.CL} }

提供机构：

maas

创建时间：

2025-07-24

5,000+

优质数据集

54 个

任务类型

进入经典数据集