declare-lab/cicero

Name: declare-lab/cicero
Creator: declare-lab
Published: 2022-05-31 04:30:37
License: 暂无描述

Hugging Face2022-05-31 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/declare-lab/cicero

下载链接

链接失效反馈

官方服务：

资源简介：

CICERO是一个用于对话推理的新数据集，包含53,000个关于五种常识维度（原因、后续事件、前提、动机和情感反应）的推理，这些推理来源于5,600个对话。数据集设计了生成推理和多选答案选择任务，以展示其在对话推理中的实用性。文本语言为英语。数据集的创建基于DailyDialog、DREAM和MuTual三个现有数据集，并通过人工注释和对抗过滤算法生成答案选项。

CICERO is a novel dataset for conversational reasoning. It encompasses 53,000 reasoning samples across five common-sense dimensions: cause, subsequent event, premise, motivation, and emotional response, sourced from 5,600 dialogues. Two tasks, namely generative reasoning and multiple-choice answer selection, are designed for this dataset to demonstrate its utility in conversational reasoning scenarios. All text in the dataset is in English. The dataset is constructed based on three existing datasets: DailyDialog, DREAM, and MuTual, and its answer options are generated via manual annotation and adversarial filtering algorithms.

提供机构：

declare-lab

原始信息汇总

数据集概述

名称: CICERO

描述: CICERO是一个用于对话推理的新数据集，包含53,000个推理实例，涵盖五个常识维度：原因、后续事件、先决条件、动机和情感反应。数据集从5,600个对话中收集，设计了生成推理和多选择答案选择任务，以展示其在对话推理中的应用。

支持任务:

推理生成（NLG）
多选择答案选择（QA）

语言: 英语（BCP-47代码：en）

数据集结构

数据字段:

ID: 对话ID与数据集指示符。
Dialogue: 对话的语句列表。
Target: 目标语句。
Question: 五个问题之一（推理类型）。
Choices: 五个可能的答案选择列表，其中一个答案由人编写，其他四个由机器生成并通过对抗过滤算法选择。
Human Written Answer: 人编写答案的索引，索引从0开始。
Correct Answers: 人类标注者标记为合理或推测正确的所有正确答案列表，包括人编写答案的索引。

数据实例:

{ "ID": "daily-dialogue-1291", "Dialogue": [ "A: Hello , is there anything I can do for you ?", "B: Yes . I would like to check in .", "A: Have you made a reservation ?", "B: Yes . I am Belen .", "A: So your room number is 201 . Are you a member of our hotel ?", "B: No , whats the difference ?", "A: Well , we offer a 10 % charge for our members ." ], "Target": "Well , we offer a 10 % charge for our members .", "Question": "What subsequent event happens or could happen following the target?", "Choices": [ "For future discounts at the hotel, the listener takes a credit card at the hotel.", "The listener is not enrolled in a hotel membership.", "For future discounts at the airport, the listener takes a membership at the airport.", "For future discounts at the hotel, the listener takes a membership at the hotel.", "The listener doesnt have a membership to the hotel." ], "Human Written Answer": [ 3 ], "Correct Answers": [ 3 ] }

数据分割:

训练集: 31,418个实例
验证集: 10,888个实例
测试集: 10,898个实例

数据集创建

源数据:

对话数据来自三个数据集：DailyDialog, DREAM, 和 MuTual。

引用信息:

@inproceedings{ghosal2022cicero, title={CICERO: A Dataset for Contextualized Commonsense Inference in Dialogues}, author={Ghosal, Deepanway and Shen, Siqi and Majumder, Navonil and Mihalcea, Rada and Poria, Soujanya}, booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, pages={5010--5028}, year={2022} }

5,000+

优质数据集

54 个

任务类型

进入经典数据集