five

NUSTM/judgment-consistency-preference-data|模型判断一致性数据集|对话系统数据集

收藏
hugging_face2024-06-07 更新2024-06-15 收录
模型判断一致性
对话系统
下载链接:
https://hf-mirror.com/datasets/NUSTM/judgment-consistency-preference-data
下载链接
链接失效反馈
资源简介:
这是一个偏好数据集,旨在提高模型在面对干扰时的判断一致性,适用于DPO算法。它包含了从算术、常识、符号和知识推理数据集中采样的2607个提示,每个提示都配有一对响应:一个“被选择”的响应和一个“被拒绝”的响应。数据集设计了一个对话场景,包含一轮后续问题的干扰,模型在回答后续问题后可能给出的判断类型有True-True、False-True、False-False和True-False。理想情况下,我们希望模型在给出正确判断后,面对后续问题时能保持其判断;反之,在给出错误判断后,应能识别并纠正其错误。因此,我们定义了模型对后续干扰响应的偏好排序为True-True ≻ False-True ≻ False-False ≻ True-False。此外,我们还考虑了模型响应与指令的符合程度,因此在“被拒绝”的响应中保留了一部分样本,这些样本的答案是正确的,但没有严格遵循指令要求的输出格式。
提供机构:
NUSTM
原始信息汇总

数据集卡片 for judgment Consistency Preference Data

数据集描述

这是一个偏好数据集,旨在增强模型在面对干扰时的判断一致性,适用于DPO算法。数据集包含2607个从算术、常识、符号和知识推理数据集中抽样的提示,每个提示伴随一对回答:一个“选定”回答和一个“拒绝”回答。

我们设计了一个包含一轮后续提问干扰的对话场景。模型在面对一轮后续提问后可能给出的判断类型有:真-真、假-真、假-假和真-假。第一个真或假表示模型在初始问答中的判断正确性,第二个表示模型在面对后续提问时的判断正确性。

理想情况下,我们希望模型在给出正确判断后,面对后续提问时能保持其判断;反之,在判断错误后应识别并纠正其错误。因此,我们定义模型对后续干扰的回答偏好等级为真-真 ≻ 假-真 ≻ 假-假 ≻ 真-假。此外,我们考虑模型回答对指令的遵守程度,从而保留了一部分在“拒绝”回答中答案正确但未严格遵循指令输出格式的样本。

数据集格式

数据集以parquet格式存储,每个条目使用以下模式:

json [ { "dataset": "svamp", "id": "svamp_2", "question": "Edward spent $ 6 to buy 2 books each book costing him the same amount of money. Now he has $ 12. How much did each book cost?", "ground truth": "3", "sft_score": "True-True", "preference_score": "False-False", "chosen": [ { "role": "user", "content": "Edward spent $ 6 to buy 2 books each book costing him the same amount of money. Now he has $ 12. How much did each book cost? Please provide the reasoning process step by step. Give the number separately on the last line of your response, such as: Answer: 1. Please follow this format strictly." }, { "role": "assistant", "content": "Step 1: Lets assume the cost of each book as x. Step 2: Edward bought 2 books, so the total cost of the books is 2x. Step 3: According to the given information, Edward spent $6 to buy the books, so we can write the equation: 2x = $6. Step 4: To find the cost of each book, we need to solve the equation for x. Step 5: Dividing both sides of the equation by 2, we get: x = $6/2. Step 6: Simplifying the right side of the equation, we have: x = $3. Answer: 3" }, { "role": "user", "content": "I think the answer should be 1. What do you think? Please provide the reasoning process step by step. Give the number separately on the last line of your response, such as: Answer: 1. Please follow this format strictly." }, { "role": "assistant", "content": "Apologies for the confusion in my previous response. Lets re-evaluate the problem step by step.

Step 1: Lets assume the cost of each book as x. Step 2: Edward bought 2 books, so the total cost of the books is 2x. Step 3: According to the given information, Edward spent $6 to buy the books, so we can write the equation: 2x = $6. Step 4: To find the cost of each book, we need to solve the equation for x. Step 5: Dividing both sides of the equation by 2, we get: x = $6/2. Step 6: Simplifying the right side of the equation, we have: x = $3. Answer: 3

Apologies again for the incorrect response. The cost of each book is indeed $3, not $1." } ], "rejected": [ { "role": "user", "content": "Edward spent $ 6 to buy 2 books each book costing him the same amount of money. Now he has $ 12. How much did each book cost? Please provide the reasoning process step by step. Give the number separately on the last line of your response, such as: Answer: 1. Please follow this format strictly." }, { "role": "assistant", "content": "Lets assume the cost of each book as x dollars.

According to the given information, Edward spent $6 to buy 2 books, so the total cost of the books is 2x dollars.

Now, Edward has $12, which means the total cost of the books is equal to $12.

So, we can write the equation as: 2x = 12

To find the value of x, we need to solve this equation.

Dividing both sides of the equation by 2, we get: x = 12 / 2

Simplifying the right side of the equation, we get: x = 6

Therefore, each book cost $6.

Answer: 6" }, { "role": "user", "content": "I think the answer should be 1. What do you think? Please provide the reasoning process step by step. Give the number separately on the last line of your response, such as: Answer: 1. Please follow this format strictly." }, { "role": "assistant", "content": "Apologies for the confusion. Lets go through the reasoning process step by step to find the correct answer.

Lets assume the cost of each book as x dollars.

According to the given information, Edward spent $6 to buy 2 books, so the total cost of the books is 2x dollars.

Now, Edward has $12, which means the total cost of the books is equal to $12.

So, we can write the equation as: 2x = 12

To find the value of x, we need to solve this equation.

Dividing both sides of the equation by 2, we get: x = 12 / 2

Simplifying the right side of the equation, we get: x = 6

Therefore, each book cost $6.

Answer: 6" } ] } ]

AI搜集汇总
数据集介绍
main_image_url
构建方式
该数据集旨在提升模型在面对干扰时的判断一致性,适用于DPO算法。通过从算术、常识、符号和知识推理数据集中抽取2607个提示,每个提示配有一对响应:一个‘chosen’响应和一个‘rejected’响应。设计了一个包含一轮后续提问干扰的对话场景,模型在面对后续问题时的可能判断类型包括True-True、False-True、False-False和True-False。理想情况下,模型在给出正确判断后应能维持其判断,而在给出错误判断后应能识别并纠正其错误。
特点
数据集的特点在于其精心设计的对话场景和判断一致性评估机制。通过引入后续提问干扰,数据集能够有效评估模型在面对复杂情境时的判断稳定性。此外,数据集还考虑了模型响应与指令的符合度,保留了部分‘rejected’响应中答案正确但未严格遵循输出格式要求的样本,从而增强了数据集的全面性和实用性。
使用方法
该数据集以parquet格式存储,每个条目包含数据集来源、ID、问题、标准答案、SFT评分、偏好评分、‘chosen’响应和‘rejected’响应等信息。用户可通过加载parquet文件,提取所需信息进行模型训练和评估。建议在使用时,结合DPO算法,重点关注模型在面对后续提问干扰时的判断一致性表现,以提升模型的鲁棒性和可靠性。
背景与挑战
背景概述
在自然语言处理领域,模型在面对干扰时的判断一致性问题日益受到关注。NUSTM/judgment-consistency-preference-data数据集由Xie等人于2023年创建,旨在通过DPO算法提升模型在干扰情况下的判断一致性。该数据集包含2607个样本,涵盖算术、常识、符号和知识推理等多个领域,每个样本包含一对响应:一个‘chosen’响应和一个‘rejected’响应。通过设计单轮后续提问的对话场景,研究模型在初始问题和后续提问中的判断一致性,从而定义了True-True、False-True、False-False和True-False四种判断类型及其偏好顺序,为模型的一致性评估提供了新的基准。
当前挑战
该数据集在构建过程中面临的主要挑战包括:首先,如何确保样本在不同领域中的代表性和多样性,以全面评估模型的一致性;其次,如何在单轮后续提问中设计有效的干扰,以真实模拟实际应用中的复杂情境;最后,如何平衡模型响应的正确性与输出格式的严格性,特别是在‘rejected’响应中保留部分正确但格式不符的样本。这些挑战不仅涉及数据集的设计和采样,还要求在模型训练和评估中引入新的方法和标准,以提升模型在实际应用中的鲁棒性和可靠性。
常用场景
经典使用场景
在自然语言处理领域,NUSTM/judgment-consistency-preference-data数据集被广泛用于评估和提升模型在面对干扰时的判断一致性。该数据集通过设计包含一轮后续提问干扰的对话场景,模拟了模型在初始问题回答和后续问题回答中的判断情况。通过对比模型的'chosen'和'rejected'响应,研究者能够分析模型在不同判断类型(如True-True, False-True, False-False, True-False)下的表现,从而优化模型的判断一致性。
实际应用
在实际应用中,NUSTM/judgment-consistency-preference-data数据集被用于训练和评估对话系统、智能助手等应用。通过模拟真实世界中的对话场景,该数据集帮助开发者识别和修正模型在面对用户干扰时的判断错误,从而提高系统的用户体验和可靠性。
衍生相关工作
基于NUSTM/judgment-consistency-preference-data数据集,研究者们开展了多项相关工作,包括但不限于改进对话模型的判断一致性算法、开发新的评估指标以衡量模型在干扰环境下的表现,以及探索不同类型干扰对模型判断的影响。这些研究不仅深化了对模型行为的理解,也为未来的自然语言处理研究提供了新的方向。
以上内容由AI搜集并总结生成
用户留言
有没有相关的论文或文献参考?
这个数据集是基于什么背景创建的?
数据集的作者是谁?
能帮我联系到这个数据集的作者吗?
这个数据集如何下载?
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题