peterkchung/commonsense_cot_partial_annotated_v0.1
收藏Commonsense QA CoT (Partial, Annotated) v0.1 数据集概述
数据集摘要
该数据集是来自 CommonsenseQA 数据集(tau/commonsense_qa)的随机抽样问题-答案条目的人工标注子集。每个问题-答案对的理由是通过两步方法创建的:首先,使用 Mixtral (mistralai/Mixtral-8x7B-Instruct-v0.1) 生成3个独特的CoT(思维链)解释;然后,通过人工评估将随机抽样提炼成一组连贯的问题-答案-理由三元组。在大多数情况下,Mixtral生成的响应被保留为问题-答案对的通过解释。
该数据集的假设受到以下研究论文的启发:通过与CommonsenseQA问题-答案选择一起传递的多样化CoT理由集,即使在相对较小的模型(<3B参数)上也能加速常识推理性能。
该数据集将进一步细化和注释。
数据集结构
语言
数据集为英语(en)。
数据字段
id(str): 唯一ID。question(str): 问题。question_concept(str): 与问题相关的ConceptNet概念。choices: 包含以下字段的字典特征:label(str): 选项标签。text(str): 选项文本。
answerKey(str): 答案。rationale(str): 理由。
数据示例
json { "id": "1fe48d12b6f6e4e38f4445f3ec60d5c5", "question": "What can happen to someone too sure of their learning?", "question_concept": "learning", "choices": { "label": ["A", "B", "C", "D", "E"], "text": ["growth", "gaining knowledge", "enlightenment", "knowing more", "overconfidence"] }, "answerKey": "E", "rationale": "When someone is too sure of their learning, they become overconfident, thinking that they know everything. This can prevent them from learning more, as they stop seeking new knowledge and ideas. They might also miss out on enlightenment, as they close themselves off to new perspectives. Overall, their growth might be stunted, as they stop challenging themselves and expanding their understanding. So, out of the given choices, the most appropriate answer is overconfidence." }
源数据
- 数据集来源: https://huggingface.co/datasets/tau/commonsense_qa
- 主页: https://www.tau-nlp.org/commonsenseqa
- 仓库: https://github.com/jonathanherzig/commonsenseqa
- 论文: https://arxiv.org/abs/1811.00937
许可信息
该数据集基于MIT许可证。



