DS_Critique_Bank
收藏DS Critique Bank 数据集概述
数据集描述
DS (Digital Socrates) Critique Bank 是一个包含多项选择题及其相关答案和解释的数据集,由“学生模型”提供,并由“批评模型”提供的解释批评。部分实例包含人工标注。详细信息请参考论文 arXiv:2311.09613。
模型信息
- 学生模型:
gpt-4-0613,gpt-3.5-turbo-0613,Llama-2-70b-Chat,Llama-2-7b-Chat - 批评模型:
gpt-4-0613,DS-13B,DS-7B(后两者是基于 Llama-2-Chat 模型微调的 Digital Socrates 模型)
数据文件
数据集包含以下文件:
DSCB-train-silver.jsonl: 3240 个实例,包含银标 GPT-4 批评DSCB-train-crowd-anno.jsonl: 3240 个实例,包含人工标注的 GPT-4 批评DSCB-train-expert.jsonl: 198 个实例,包含人工编辑的批评DSCB-dev-crowd-anno.jsonl: 270 个实例,包含人工标注的 GPT-4、DS-13B 和 DS-7B 批评DSCB-dev-non-anno.jsonl: 6330 个实例,包含 GPT-4、DS-13B 和 DS-7B 批评DSCB-prompts.json: 用于查询学生模型解释和批评模型批评的提示
数据字段
jsonl 文件包含以下字段:
id: 实例的唯一 IDqid: 原始数据集中的问题 IDdataset: 问题来源的数据集question: 问题的完整文本,包含答案选项gold_answer: 问题的正确答案标签student_model: 使用的学生模型student_prompt: 用于学生模型的提示(具体提示见DSCB-prompts.json)student_llm_options: 学生模型使用的选项(如温度)student_answer: 学生模型预测的答案student_accuracy: 答案是否正确(1 表示正确,0 表示不正确)student_explanation: 学生模型提供的解释文本student_raw_output: 学生模型的原始输出(解析为student_answer和student_explanation)critiques: 学生解释的批评列表,每个批评包含以下字段:critique_model: 使用的批评模型critique_llm_options: 批评模型使用的选项(如温度)critique_text: 批评的完整文本critique_elements: 批评的元素,包括main_flaw,dimension,general_feedback,specific_feedback,explanation_score(0 到 5 的评分)
部分实例包含人工标注,包括解释级别的标注和每个批评的标注。解释级别的标注包含以下字段:
explanation_score: 工人分配的解释评分dimensions: 工人识别的主要缺陷维度列表worker: 每个工人的唯一 ID
每个批评可能包含以下字段:
critique_score: 工人对批评质量的评分(0-3 分)worker: 每个工人的唯一 ID
示例
以下是 DSCB-train-crowd-anno.jsonl 分区的完整示例:
json
{
"id": "ARCCH_Mercury_7218173$Llama-2-7b-chat$QA_reasoning_step1",
"qid": "ARCCH_Mercury_7218173",
"dataset": "ARC-Challenge",
"question": "Which of these stars is most similar to the Sun? (A) the red giant star Arcturus (B) the white dwarf star Sirius B (C) the main sequence star Alpha Mensae (D) the blue supergiant star Rigel",
"gold_answer": "C",
"student_model": "Llama-2-7b-chat",
"student_prompt": "QA_reasoning_step1",
"student_llm_options": {"max_tokens": 512, "temperature": 0},
"student_answer": "A",
"student_accuracy": 0,
"student_explanation": "1) The Sun is a main sequence star, which means it is in the middle of its life cycle and is fusing hydrogen into helium in its core.
2) Arcturus and Alpha Mensae are also main sequence stars, meaning they are similar to the Sun in terms of their stage of evolution.
3) Sirius B is a white dwarf star, which means it has already exhausted its hydrogen fuel and has a much smaller core than the Sun.
4) Rigel is a blue supergiant star, which means it is in a very different stage of evolution than the Sun and is much larger and hotter.",
"student_raw_output": " Sure, Id be happy to help! Heres the answer to the question:
Question: Which of these stars is most similar to the Sun?
Reasoning:
- The Sun is a main sequence star, which means it is in the middle of its life cycle and is fusing hydrogen into helium in its core.
- Arcturus and Alpha Mensae are also main sequence stars, meaning they are similar to the Sun in terms of their stage of evolution.
- Sirius B is a white dwarf star, which means it has already exhausted its hydrogen fuel and has a much smaller core than the Sun.
- Rigel is a blue supergiant star, which means it is in a very different stage of evolution than the Sun and is much larger and hotter.
Answer: (A) the red giant star Arcturus", "critiques": [ { "critique_model": "gpt-4-0613", "critique_llm_options": {"max_tokens": 512, "temperature": 0}, "critique_text": "The explanation states or suggests the following:
- Main flaw (standalone statement): "Arcturus and Alpha Mensae are also main sequence stars, meaning they are similar to the Sun in terms of their stage of evolution."
- Dimension: incorrect_information
Consider these points for revising the explanation:
- General: Its important to verify your facts before making a conclusion. Not all main sequence stars are similar to the Sun, and not all stars are main sequence stars.
- Specific: In this case, Arcturus is not a main sequence star, but a red giant. Therefore, it is not similar to the Sun in terms of its stage of evolution. The correct answer should be Alpha Mensae, which is a main sequence star like the Sun.
Explanation score: 2", "critique_elements": { "main_flaw": ""Arcturus and Alpha Mensae are also main sequence stars, meaning they are similar to the Sun in terms of their stage of evolution."", "dimension": "incorrect_information", "general_feedback": "Its important to verify your facts before making a conclusion. Not all main sequence stars are similar to the Sun, and not all stars are main sequence stars.", "specific_feedback": "In this case, Arcturus is not a main sequence star, but a red giant. Therefore, it is not similar to the Sun in terms of its stage of evolution. The correct answer should be Alpha Mensae, which is a main sequence star like the Sun.", "explanation_score": 2 }, "critique_annotations": [ {"critique_score": 2, "worker": "Worker000124"}, {"critique_score": 2, "worker": "Worker000111"}, {"critique_score": 3, "worker": "Worker000119"} ] } ], "explanation_annotations": [ { "explanation_score": 1, "dimensions": ["inconsistent_answer", "incorrect_reasoning", "lack_justification"], "worker": "Worker000124" }, { "explanation_score": 1, "dimensions": ["incomplete_reasoning", "incorrect_information", "incorrect_reasoning", "misunderstanding"], "worker": "Worker000111" }, { "explanation_score": 2, "dimensions": ["inconsistent_answer", "incorrect_information"], "worker": "Worker000119" } ] }
引用
@misc{gu2024digitalsocrates, title={Digital {Socrates}: Evaluating {LLMs} through Explanation Critiques}, author={Yuling Gu and Oyvind Tafjord and Peter Clark}, year={2024}, eprint={2311.09613}, archivePrefix={arXiv}, primaryClass={cs.CL} }




