DS_Critique_Bank
收藏魔搭社区2025-08-22 更新2025-05-31 收录
下载链接:
https://modelscope.cn/datasets/allenai/DS_Critique_Bank
下载链接
链接失效反馈官方服务:
资源简介:
# DS Critique Bank
DS (Digital Socrates) Critique Bank is a dataset of multiple-choice questions and associated answers and explanations
provided by "student models", along with "critiques" of the explanations provided by "critique models". Some of the
instances have human annotations. For details refer to our paper at https://aclanthology.org/2024.acl-long.302/.
The student models are: `gpt-4-0613`, `gpt-3.5-turbo-0613`, `Llama-2-70b-Chat`, and `Llama-2-7b-Chat`.
The critique models are: `gpt-4-0613`, `DS-13B`, and `DS-7B` (the latter two are Digital Socrates models
fine-tuned on the DSCB training data starting from Llama-2-Chat models)
The trained critique models can be accessed from the Hugging Face Model Hub.
The recommended model is [DS-13B](https://huggingface.co/allenai/digital-socrates-13b) vs
the smaller [DS-7B](https://huggingface.co/allenai/digital-socrates-7b) model.
The following files are in the dataset:
* `DSCB-train-silver.jsonl`: 3240 instances with silver GPT-4 critiques
* `DSCB-train-crowd-anno.jsonl`: 3240 instances with human-annotated GPT-4 critiques
* `DSCB-train-expert.jsonl`: 198 instances with human-edited critiques
* `DSCB-dev-crowd-anno.jsonl`: 270 instances with human-annotated critiques from GPT-4, DS-13B, and DS-7B
* `DSCB-dev-non-anno.jsonl`: 6330 instances with critiques from GPT-4, DS-13B, and DS-7B
* `DSCB-prompts.json`: The prompts used for querying student model explanations and critique model critiques
The prompts have placeholders in double brackets, like `[[QUESTION]]`, for inserting the different variables.
The jsonl files have the following fields:
* `id`: Unique id of instance (combining `qid`, `student_model` and `student_prompt`)
* `qid`: Question id from original dataset
* `dataset`: Which dataset the question comes from
* `question`: Full text of question, with answer choices
* `gold_answer`: The label of the correct answer to the question
* `student_model`: Which student model was used
* `student_prompt`: Which prompt was used for student model (see `DSCB-prompts.json` for actual prompt)
* `student_llm_options`: Options (like temperature) used by student model
* `student_answer`: Answer predicted by student model
* `student_accuracy`: Whether answer is correct (1) or incorrect (0)
* `student_explanation`: Explanation text provided by student model
* `student_raw_output`: Raw output from student model (which was parsed into `student_answer` and `student_explanation`)
* `critiques`: A list of critiques of the student explanation, with the following fields for each critique:
* `critique_model`: Which critique model was used
* `critique_llm_options`: Options (like temperature) used by critique model
* `critique_text`: The full text of the critique
* `critique_elements`: A dictionary of the elements of the critique, namely `main_flaw`, `dimension`, `general_feedback`, `specific_feedback`, and `explanation_score` (number from 0 to 5)
In addition, some instances will have human annotations from crowd workers, both at the explanation level and for each critique.
At the top level there will then be a `explanation_annotations` field which is a list of json objects with the following fields:
* `explanation_score`: Explanation score assigned by worker
* `dimensions`: A list of major flaw dimensions identified by worker
* `worker`: A unique ID associated with each worker
For each critique, there might be a `critique_annotations` which is again a list of json objects with these fields:
* `critique_score`: The quality of the critique (on 0-3 scale) according to worker
* `worker`: A unique ID associated with each worker
## Full example
Here is a full example from the `DSCB-train-crowd-anno.jsonl` partition:
```
{
"id":"ARCCH_Mercury_7218173$Llama-2-7b-chat$QA_reasoning_step1",
"qid":"ARCCH_Mercury_7218173",
"dataset":"ARC-Challenge",
"question":"Which of these stars is most similar to the Sun? (A) the red giant star Arcturus (B) the white dwarf star Sirius B (C) the main sequence star Alpha Mensae (D) the blue supergiant star Rigel",
"gold_answer":"C",
"student_model":"Llama-2-7b-chat",
"student_prompt":"QA_reasoning_step1",
"student_llm_options":{"max_tokens":512, "temperature":0},
"student_answer":"A",
"student_accuracy":0,
"student_explanation":"1) The Sun is a main sequence star, which means it is in the middle of its life cycle and is fusing hydrogen into helium in its core.\n2) Arcturus and Alpha Mensae are also main sequence stars, meaning they are similar to the Sun in terms of their stage of evolution.\n3) Sirius B is a white dwarf star, which means it has already exhausted its hydrogen fuel and has a much smaller core than the Sun.\n4) Rigel is a blue supergiant star, which means it is in a very different stage of evolution than the Sun and is much larger and hotter.",
"student_raw_output":" Sure, I'd be happy to help! Here's the answer to the question:\n\nQuestion: Which of these stars is most similar to the Sun?\n\nReasoning:\n\n1) The Sun is a main sequence star, which means it is in the middle of its life cycle and is fusing hydrogen into helium in its core.\n2) Arcturus and Alpha Mensae are also main sequence stars, meaning they are similar to the Sun in terms of their stage of evolution.\n3) Sirius B is a white dwarf star, which means it has already exhausted its hydrogen fuel and has a much smaller core than the Sun.\n4) Rigel is a blue supergiant star, which means it is in a very different stage of evolution than the Sun and is much larger and hotter.\n\nAnswer: (A) the red giant star Arcturus",
"critiques":[
{
"critique_model":"gpt-4-0613",
"critique_llm_options":{"max_tokens":512, "temperature":0},
"critique_text":"The explanation states or suggests the following:\n * Main flaw (standalone statement): \"Arcturus and Alpha Mensae are also main sequence stars, meaning they are similar to the Sun in terms of their stage of evolution.\"\n * Dimension: incorrect_information\n\nConsider these points for revising the explanation:\n * General: It's important to verify your facts before making a conclusion. Not all main sequence stars are similar to the Sun, and not all stars are main sequence stars.\n * Specific: In this case, Arcturus is not a main sequence star, but a red giant. Therefore, it is not similar to the Sun in terms of its stage of evolution. The correct answer should be Alpha Mensae, which is a main sequence star like the Sun.\n\nExplanation score: 2",
"critique_elements":{
"main_flaw":"\"Arcturus and Alpha Mensae are also main sequence stars, meaning they are similar to the Sun in terms of their stage of evolution.\"",
"dimension":"incorrect_information",
"general_feedback":"It's important to verify your facts before making a conclusion. Not all main sequence stars are similar to the Sun, and not all stars are main sequence stars.",
"specific_feedback":"In this case, Arcturus is not a main sequence star, but a red giant. Therefore, it is not similar to the Sun in terms of its stage of evolution. The correct answer should be Alpha Mensae, which is a main sequence star like the Sun.",
"explanation_score":2
},
"critique_annotations":[
{"critique_score":2,"worker":"Worker000124"},
{"critique_score":2, "worker":"Worker000111"},
{"critique_score":3, "worker":"Worker000119"}
]
}
],
"explanation_annotations":[
{
"explanation_score":1,
"dimensions":["inconsistent_answer", "incorrect_reasoning", "lack_justification"],
"worker":"Worker000124"
},
{
"explanation_score":1,
"dimensions":["incomplete_reasoning", "incorrect_information", "incorrect_reasoning", "misunderstanding"],
"worker":"Worker000111"
},
{
"explanation_score":2,
"dimensions":["inconsistent_answer", "incorrect_information"],
"worker":"Worker000119"
}
]
}
```
## Citation
```
@inproceedings{gu-etal-2024-digital,
title = "Digital Socrates: Evaluating {LLM}s through Explanation Critiques",
author = "Gu, Yuling and
Tafjord, Oyvind and
Clark, Peter",
editor = "Ku, Lun-Wei and
Martins, Andre and
Srikumar, Vivek",
booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = aug,
year = "2024",
address = "Bangkok, Thailand",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.acl-long.302",
pages = "5559--5586",
}
```
# DS(数字苏格拉底)批评库(DS Critique Bank)
DS(数字苏格拉底)批评库是一个包含多项选择题及配套答案与解析的数据集,其中解析由**学生模型(student models)**生成,同时附带由**批评模型(critique models)**生成的针对该解析的**批评意见(critiques)**。部分样本带有人类标注。详细信息请参阅我们的论文:https://aclanthology.org/2024.acl-long.302/。
所用学生模型包括:`gpt-4-0613`、`gpt-3.5-turbo-0613`、`Llama-2-70b-Chat`与`Llama-2-7b-Chat`。
所用批评模型包括:`gpt-4-0613`、`DS-13B`与`DS-7B`(后两者为基于Llama-2-Chat模型,在DSCB训练数据上微调得到的Digital Socrates模型)。
训练完成的批评模型可从**Hugging Face模型库(Hugging Face Model Hub)**获取。推荐模型为[DS-13B](https://huggingface.co/allenai/digital-socrates-13b),以及尺寸更小的[DS-7B](https://huggingface.co/allenai/digital-socrates-7b)模型。
本数据集包含以下文件:
* `DSCB-train-silver.jsonl`:包含3240条带GPT-4银标批评的样本
* `DSCB-train-crowd-anno.jsonl`:包含3240条带人类标注GPT-4批评的样本
* `DSCB-train-expert.jsonl`:包含198条带人工编辑批评的样本
* `DSCB-dev-crowd-anno.jsonl`:包含270条带GPT-4、DS-13B与DS-7B生成批评的人类标注样本
* `DSCB-dev-non-anno.jsonl`:包含6330条带GPT-4、DS-13B与DS-7B生成批评的未标注样本
* `DSCB-prompts.json`:用于查询学生模型生成解析与批评模型生成批评的提示词文件
提示词中使用双括号包裹占位符,例如`[[QUESTION]]`,用于插入不同变量。
所有jsonl格式文件均包含以下字段:
* `id`:样本唯一标识符,由`qid`、`student_model`与`student_prompt`拼接而成
* `qid`:原始数据集的问题编号
* `dataset`:该问题所属的数据集名称
* `question`:完整问题文本,包含所有选项
* `gold_answer`:该问题正确答案的标签
* `student_model`:所用的学生模型
* `student_prompt`:用于学生模型的提示词(完整提示词请参见`DSCB-prompts.json`)
* `student_llm_options`:学生模型使用的生成配置(如温度参数)
* `student_answer`:学生模型预测的答案
* `student_accuracy`:答案是否正确的标识,正确为1,错误为0
* `student_explanation`:学生模型生成的解析文本
* `student_raw_output`:学生模型的原始输出(已被解析为`student_answer`与`student_explanation`)
* `critiques`:学生解析的批评意见列表,每条批评包含以下字段:
* `critique_model`:所用的批评模型
* `critique_llm_options`:批评模型使用的生成配置
* `critique_text`:完整的批评意见文本
* `critique_elements`:批评意见的结构化字典,包含`main_flaw`(主要缺陷)、`dimension`(缺陷维度)、`general_feedback`(通用反馈)、`specific_feedback`(具体反馈)与`explanation_score`(解析评分,取值0至5的整数)
此外,部分样本还包含众包工作者的人类标注,覆盖解析层面与每条批评层面。此时样本顶层将包含`explanation_annotations`字段,为一个JSON对象列表,每个对象包含以下字段:
* `explanation_score`:标注者给出的解析评分
* `dimensions`:标注者识别的主要缺陷维度列表
* `worker`:每个标注者的唯一标识符
针对每条批评,可能包含`critique_annotations`字段,同样为JSON对象列表,包含以下字段:
* `critique_score`:标注者对批评质量的评分(0至3分)
* `worker`:每个标注者的唯一标识符
## 完整示例
以下是来自`DSCB-train-crowd-anno.jsonl`划分集的一个完整样本示例:
{"id":"ARCCH_Mercury_7218173$Llama-2-7b-chat$QA_reasoning_step1","qid":"ARCCH_Mercury_7218173","dataset":"ARC-Challenge","question":"Which of these stars is most similar to the Sun? (A) the red giant star Arcturus (B) the white dwarf star Sirius B (C) the main sequence star Alpha Mensae (D) the blue supergiant star Rigel","gold_answer":"C","student_model":"Llama-2-7b-chat","student_prompt":"QA_reasoning_step1","student_llm_options":{"max_tokens":512, "temperature":0},"student_answer":"A","student_accuracy":0,"student_explanation":"1) The Sun is a main sequence star, which means it is in the middle of its life cycle and is fusing hydrogen into helium in its core.
2) Arcturus and Alpha Mensae are also main sequence stars, meaning they are similar to the Sun in terms of their stage of evolution.
3) Sirius B is a white dwarf star, which means it has already exhausted its hydrogen fuel and has a much smaller core than the Sun.
4) Rigel is a blue supergiant star, which means it is in a very different stage of evolution than the Sun and is much larger and hotter.","student_raw_output":" Sure, I'd be happy to help! Here's the answer to the question:
Question: Which of these stars is most similar to the Sun?
Reasoning:
1) The Sun is a main sequence star, which means it is in the middle of its life cycle and is fusing hydrogen into helium in its core.
2) Arcturus and Alpha Mensae are also main sequence stars, meaning they are similar to the Sun in terms of their stage of evolution.
3) Sirius B is a white dwarf star, which means it has already exhausted its hydrogen fuel and has a much smaller core than the Sun.
4) Rigel is a blue supergiant star, which means it is in a very different stage of evolution than the Sun and is much larger and hotter.
Answer: (A) the red giant star Arcturus","critiques":[{"critique_model":"gpt-4-0613","critique_llm_options":{"max_tokens":512, "temperature":0},"critique_text":"The explanation states or suggests the following:
* Main flaw (standalone statement): "Arcturus and Alpha Mensae are also main sequence stars, meaning they are similar to the Sun in terms of their stage of evolution."
* Dimension: incorrect_information
Consider these points for revising the explanation:
* General: It's important to verify your facts before making a conclusion. Not all main sequence stars are similar to the Sun, and not all stars are main sequence stars.
* Specific: In this case, Arcturus is not a main sequence star, but a red giant. Therefore, it is not similar to the Sun in terms of its stage of evolution. The correct answer should be Alpha Mensae, which is a main sequence star like the Sun.
Explanation score: 2","critique_elements":{"main_flaw":""Arcturus and Alpha Mensae are also main sequence stars, meaning they are similar to the Sun in terms of their stage of evolution."","dimension":"incorrect_information","general_feedback":"It's important to verify your facts before making a conclusion. Not all main sequence stars are similar to the Sun, and not all stars are main sequence stars.","specific_feedback":"In this case, Arcturus is not a main sequence star, but a red giant. Therefore, it is not similar to the Sun in terms of its stage of evolution. The correct answer should be Alpha Mensae, which is a main sequence star like the Sun.","explanation_score":2},"critique_annotations":[{"critique_score":2,"worker":"Worker000124"},{"critique_score":2, "worker":"Worker000111"},{"critique_score":3, "worker":"Worker000119"}]}],"explanation_annotations":[{"explanation_score":1,"dimensions":["inconsistent_answer", "incorrect_reasoning", "lack_justification"],"worker":"Worker000124"},{"explanation_score":1,"dimensions":["incomplete_reasoning", "incorrect_information", "incorrect_reasoning", "misunderstanding"],"worker":"Worker000111"},{"explanation_score":2,"dimensions":["inconsistent_answer", "incorrect_information"],"worker":"Worker000119"}]}
## 引用
@inproceedings{gu-etal-2024-digital,
title = "Digital Socrates: Evaluating {LLM}s through Explanation Critiques",
author = "Gu, Yuling and
Tafjord, Oyvind and
Clark, Peter",
editor = "Ku, Lun-Wei and
Martins, Andre and
Srikumar, Vivek",
booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = aug,
year = "2024",
address = "Bangkok, Thailand",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.acl-long.302",
pages = "5559--5586",
}
提供机构:
maas
创建时间:
2025-05-27



