declare-lab/cicero
收藏数据集概述
名称: CICERO
描述: CICERO是一个用于对话推理的新数据集,包含53,000个推理实例,涵盖五个常识维度:原因、后续事件、先决条件、动机和情感反应。数据集从5,600个对话中收集,设计了生成推理和多选择答案选择任务,以展示其在对话推理中的应用。
支持任务:
- 推理生成(NLG)
- 多选择答案选择(QA)
语言: 英语(BCP-47代码:en)
数据集结构
数据字段:
- ID: 对话ID与数据集指示符。
- Dialogue: 对话的语句列表。
- Target: 目标语句。
- Question: 五个问题之一(推理类型)。
- Choices: 五个可能的答案选择列表,其中一个答案由人编写,其他四个由机器生成并通过对抗过滤算法选择。
- Human Written Answer: 人编写答案的索引,索引从0开始。
- Correct Answers: 人类标注者标记为合理或推测正确的所有正确答案列表,包括人编写答案的索引。
数据实例:
{ "ID": "daily-dialogue-1291", "Dialogue": [ "A: Hello , is there anything I can do for you ?", "B: Yes . I would like to check in .", "A: Have you made a reservation ?", "B: Yes . I am Belen .", "A: So your room number is 201 . Are you a member of our hotel ?", "B: No , whats the difference ?", "A: Well , we offer a 10 % charge for our members ." ], "Target": "Well , we offer a 10 % charge for our members .", "Question": "What subsequent event happens or could happen following the target?", "Choices": [ "For future discounts at the hotel, the listener takes a credit card at the hotel.", "The listener is not enrolled in a hotel membership.", "For future discounts at the airport, the listener takes a membership at the airport.", "For future discounts at the hotel, the listener takes a membership at the hotel.", "The listener doesnt have a membership to the hotel." ], "Human Written Answer": [ 3 ], "Correct Answers": [ 3 ] }
数据分割:
- 训练集: 31,418个实例
- 验证集: 10,888个实例
- 测试集: 10,898个实例
数据集创建
源数据:
- 对话数据来自三个数据集:DailyDialog, DREAM, 和 MuTual。
引用信息:
@inproceedings{ghosal2022cicero, title={CICERO: A Dataset for Contextualized Commonsense Inference in Dialogues}, author={Ghosal, Deepanway and Shen, Siqi and Majumder, Navonil and Mihalcea, Rada and Poria, Soujanya}, booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, pages={5010--5028}, year={2022} }



