NUSTM/judgment-consistency-preference-data|模型判断一致性数据集|对话系统数据集

hugging_face2024-06-07 更新2024-06-15 收录

模型判断一致性

对话系统

下载链接：

https://hf-mirror.com/datasets/NUSTM/judgment-consistency-preference-data

下载链接

链接失效反馈

资源简介：

这是一个偏好数据集，旨在提高模型在面对干扰时的判断一致性，适用于DPO算法。它包含了从算术、常识、符号和知识推理数据集中采样的2607个提示，每个提示都配有一对响应：一个“被选择”的响应和一个“被拒绝”的响应。数据集设计了一个对话场景，包含一轮后续问题的干扰，模型在回答后续问题后可能给出的判断类型有True-True、False-True、False-False和True-False。理想情况下，我们希望模型在给出正确判断后，面对后续问题时能保持其判断；反之，在给出错误判断后，应能识别并纠正其错误。因此，我们定义了模型对后续干扰响应的偏好排序为True-True ≻ False-True ≻ False-False ≻ True-False。此外，我们还考虑了模型响应与指令的符合程度，因此在“被拒绝”的响应中保留了一部分样本，这些样本的答案是正确的，但没有严格遵循指令要求的输出格式。

提供机构：

NUSTM

原始信息汇总

数据集卡片 for judgment Consistency Preference Data

数据集描述

这是一个偏好数据集，旨在增强模型在面对干扰时的判断一致性，适用于DPO算法。数据集包含2607个从算术、常识、符号和知识推理数据集中抽样的提示，每个提示伴随一对回答：一个“选定”回答和一个“拒绝”回答。

我们设计了一个包含一轮后续提问干扰的对话场景。模型在面对一轮后续提问后可能给出的判断类型有：真-真、假-真、假-假和真-假。第一个真或假表示模型在初始问答中的判断正确性，第二个表示模型在面对后续提问时的判断正确性。

理想情况下，我们希望模型在给出正确判断后，面对后续提问时能保持其判断；反之，在判断错误后应识别并纠正其错误。因此，我们定义模型对后续干扰的回答偏好等级为真-真 ≻ 假-真 ≻ 假-假 ≻ 真-假。此外，我们考虑模型回答对指令的遵守程度，从而保留了一部分在“拒绝”回答中答案正确但未严格遵循指令输出格式的样本。

数据集格式

数据集以parquet格式存储，每个条目使用以下模式：

json [ { "dataset": "svamp", "id": "svamp_2", "question": "Edward spent $ 6 to buy 2 books each book costing him the same amount of money. Now he has $ 12. How much did each book cost?", "ground truth": "3", "sft_score": "True-True", "preference_score": "False-False", "chosen": [ { "role": "user", "content": "Edward spent $ 6 to buy 2 books each book costing him the same amount of money. Now he has $ 12. How much did each book cost? Please provide the reasoning process step by step. Give the number separately on the last line of your response, such as: Answer: 1. Please follow this format strictly." }, { "role": "assistant", "content": "Step 1: Lets assume the cost of each book as x. Step 2: Edward bought 2 books, so the total cost of the books is 2x. Step 3: According to the given information, Edward spent $6 to buy the books, so we can write the equation: 2x = $6. Step 4: To find the cost of each book, we need to solve the equation for x. Step 5: Dividing both sides of the equation by 2, we get: x = $6/2. Step 6: Simplifying the right side of the equation, we have: x = $3. Answer: 3" }, { "role": "user", "content": "I think the answer should be 1. What do you think? Please provide the reasoning process step by step. Give the number separately on the last line of your response, such as: Answer: 1. Please follow this format strictly." }, { "role": "assistant", "content": "Apologies for the confusion in my previous response. Lets re-evaluate the problem step by step.

Step 1: Lets assume the cost of each book as x. Step 2: Edward bought 2 books, so the total cost of the books is 2x. Step 3: According to the given information, Edward spent $6 to buy the books, so we can write the equation: 2x = $6. Step 4: To find the cost of each book, we need to solve the equation for x. Step 5: Dividing both sides of the equation by 2, we get: x = $6/2. Step 6: Simplifying the right side of the equation, we have: x = $3. Answer: 3

Apologies again for the incorrect response. The cost of each book is indeed $3, not $1." } ], "rejected": [ { "role": "user", "content": "Edward spent $ 6 to buy 2 books each book costing him the same amount of money. Now he has $ 12. How much did each book cost? Please provide the reasoning process step by step. Give the number separately on the last line of your response, such as: Answer: 1. Please follow this format strictly." }, { "role": "assistant", "content": "Lets assume the cost of each book as x dollars.

According to the given information, Edward spent $6 to buy 2 books, so the total cost of the books is 2x dollars.

Now, Edward has $12, which means the total cost of the books is equal to $12.

So, we can write the equation as: 2x = 12

To find the value of x, we need to solve this equation.

Dividing both sides of the equation by 2, we get: x = 12 / 2

Simplifying the right side of the equation, we get: x = 6

Therefore, each book cost $6.

Answer: 6" }, { "role": "user", "content": "I think the answer should be 1. What do you think? Please provide the reasoning process step by step. Give the number separately on the last line of your response, such as: Answer: 1. Please follow this format strictly." }, { "role": "assistant", "content": "Apologies for the confusion. Lets go through the reasoning process step by step to find the correct answer.

Lets assume the cost of each book as x dollars.

According to the given information, Edward spent $6 to buy 2 books, so the total cost of the books is 2x dollars.

Now, Edward has $12, which means the total cost of the books is equal to $12.

So, we can write the equation as: 2x = 12

To find the value of x, we need to solve this equation.

Dividing both sides of the equation by 2, we get: x = 12 / 2

Simplifying the right side of the equation, we get: x = 6

Therefore, each book cost $6.

Answer: 6" } ] } ]

AI搜集汇总

数据集介绍

构建方式

该数据集旨在提升模型在面对干扰时的判断一致性，适用于DPO算法。通过从算术、常识、符号和知识推理数据集中抽取2607个提示，每个提示配有一对响应：一个‘chosen’响应和一个‘rejected’响应。设计了一个包含一轮后续提问干扰的对话场景，模型在面对后续问题时的可能判断类型包括True-True、False-True、False-False和True-False。理想情况下，模型在给出正确判断后应能维持其判断，而在给出错误判断后应能识别并纠正其错误。

特点

数据集的特点在于其精心设计的对话场景和判断一致性评估机制。通过引入后续提问干扰，数据集能够有效评估模型在面对复杂情境时的判断稳定性。此外，数据集还考虑了模型响应与指令的符合度，保留了部分‘rejected’响应中答案正确但未严格遵循输出格式要求的样本，从而增强了数据集的全面性和实用性。

使用方法

该数据集以parquet格式存储，每个条目包含数据集来源、ID、问题、标准答案、SFT评分、偏好评分、‘chosen’响应和‘rejected’响应等信息。用户可通过加载parquet文件，提取所需信息进行模型训练和评估。建议在使用时，结合DPO算法，重点关注模型在面对后续提问干扰时的判断一致性表现，以提升模型的鲁棒性和可靠性。

背景与挑战

背景概述

在自然语言处理领域，模型在面对干扰时的判断一致性问题日益受到关注。NUSTM/judgment-consistency-preference-data数据集由Xie等人于2023年创建，旨在通过DPO算法提升模型在干扰情况下的判断一致性。该数据集包含2607个样本，涵盖算术、常识、符号和知识推理等多个领域，每个样本包含一对响应：一个‘chosen’响应和一个‘rejected’响应。通过设计单轮后续提问的对话场景，研究模型在初始问题和后续提问中的判断一致性，从而定义了True-True、False-True、False-False和True-False四种判断类型及其偏好顺序，为模型的一致性评估提供了新的基准。

当前挑战

该数据集在构建过程中面临的主要挑战包括：首先，如何确保样本在不同领域中的代表性和多样性，以全面评估模型的一致性；其次，如何在单轮后续提问中设计有效的干扰，以真实模拟实际应用中的复杂情境；最后，如何平衡模型响应的正确性与输出格式的严格性，特别是在‘rejected’响应中保留部分正确但格式不符的样本。这些挑战不仅涉及数据集的设计和采样，还要求在模型训练和评估中引入新的方法和标准，以提升模型在实际应用中的鲁棒性和可靠性。

常用场景

经典使用场景

在自然语言处理领域，NUSTM/judgment-consistency-preference-data数据集被广泛用于评估和提升模型在面对干扰时的判断一致性。该数据集通过设计包含一轮后续提问干扰的对话场景，模拟了模型在初始问题回答和后续问题回答中的判断情况。通过对比模型的'chosen'和'rejected'响应，研究者能够分析模型在不同判断类型（如True-True, False-True, False-False, True-False）下的表现，从而优化模型的判断一致性。

实际应用

在实际应用中，NUSTM/judgment-consistency-preference-data数据集被用于训练和评估对话系统、智能助手等应用。通过模拟真实世界中的对话场景，该数据集帮助开发者识别和修正模型在面对用户干扰时的判断错误，从而提高系统的用户体验和可靠性。

衍生相关工作

基于NUSTM/judgment-consistency-preference-data数据集，研究者们开展了多项相关工作，包括但不限于改进对话模型的判断一致性算法、开发新的评估指标以衡量模型在干扰环境下的表现，以及探索不同类型干扰对模型判断的影响。这些研究不仅深化了对模型行为的理解，也为未来的自然语言处理研究提供了新的方向。

以上内容由AI搜集并总结生成

用户留言

有没有相关的论文或文献参考？

这个数据集是基于什么背景创建的？

数据集的作者是谁？

能帮我联系到这个数据集的作者吗？

这个数据集如何下载？

点击留言

数据主题

具身智能

数据集 4098个

机构 8个

大模型

数据集 439个

机构 10个

无人机

数据集 37个

机构 6个

指令微调

数据集 36个

机构 6个

蛋白质结构

数据集 50个

机构 8个

空间智能

数据集 21个

机构 5个

5,000+

优质数据集

54 个

任务类型

进入经典数据集

热门数据集

开源PHM数据集

本文分享了一个全球各大学、研究机构和公司捐赠的PHM（Prognostics and Health Management）开源数据集，涵盖加工制造、轨道交通、能源电力和半导体等行业的多种场景，包含部件级、设备级和产线级数据。用户可以利用这些数据开发智能分析和建模算法，数据集分类包括故障诊断、健康评估和寿命预测。

github 收录

ChinaTravel

ChinaTravel是由南京大学国家重点实验室开发的一个真实世界基准数据集，专门用于评估语言代理在中国旅行规划中的应用。该数据集涵盖了中国10个最受欢迎城市的旅行信息，包括720个航班和5770趟列车，以及3413个景点、4655家餐厅和4124家酒店的详细信息。数据集通过问卷调查收集用户需求，并设计了一个可扩展的领域特定语言来支持自动评估。ChinaTravel旨在解决复杂的真实世界旅行规划问题，特别是在多兴趣点行程安排和用户偏好满足方面，为语言代理在旅行规划中的应用提供了重要的测试平台。

arXiv 收录

Breast Ultrasound Images (BUSI)

小型（约500×500像素）超声图像，适用于良性和恶性病变的分类和分割任务。

github 收录

Yahoo Finance

Dataset About finance related to stock market

kaggle 收录

Houston2013, Berlin, Augsburg

本研究发布了三个多模态遥感基准数据集：Houston2013（高光谱和多光谱数据）、Berlin（高光谱和合成孔径雷达数据）和Augsburg（高光谱、合成孔径雷达和数字表面模型数据）。这些数据集用于土地覆盖分类，旨在通过共享和特定特征学习模型（S2FL）评估多模态基线。数据集包含不同模态和分辨率的图像，适用于评估和开发新的遥感图像处理技术。

arXiv 收录