mvansegbroeck/commonsense-dialogues
收藏Commonsense-Dialogues 数据集
概述
Commonsense-Dialogues 是一个众包数据集,包含约 11,000 个基于社交情境并涉及常识应用的对话。这些社交情境来源于 SocialIQA 数据集的训练集,这是一个基于多选题问答的社交常识推理基准。
数据收集
在收集 Commonsense-Dialogues 数据集时,每个参与者会得到一个社交情境,并被要求根据情境中的事件编写一个 4-6 轮的对话。参与者需要在情境中提及的个人和第三方朋友之间交替角色。
数据示例
json { "1": { "context": "Sydney met Carsons mother for the first time last week. He liked her.", "speaker": "Sydney", "turns": [ "I met Carsons mother last week for the first time.", "How was she?", "She turned out to be really nice. I like her.", "Thats good to hear.", "It is, especially since Carson and I are getting serious.", "Well, at least youll like your in-law if you guys get married." ] }, "2": { "context": "Kendall had a party at Jordans house but was found out to not have asked and just broke in.", "speaker": "Kendall", "turns": [ "Did you hear about my party this weekend at Jordanu2019s house?", "I heard it was amazing, but that you broke in.", "That was a misunderstanding, I had permission to be there.", "Who gave you permission?", "I talked to Jordan about it months ago before he left town to go to school, but he forgot to tell his roommates about it.", "Ok cool, I hope everything gets resolved." ] } }
数据分布
数据集包含在 /data 目录中,train.json 包含约 9,000 个对话,valid.json 和 test.json 各包含约 1,000 个对话。所有情境均来源于 SocialIQA 的训练集,因此在进行多任务训练和评估时需谨慎,以确保公平和准确。
数据统计
| 统计项 | 训练集 | 验证集 | 测试集 |
|---|---|---|---|
| 对话数量 | 9058 | 1157 | 1158 |
| 对话平均轮数 | 5.72 | 5.72 | 5.71 |
| 每轮平均单词数 | 12.4 | 12.4 | 12.2 |
| 使用的不同 SocialIQA 情境数量 | 3672 | 483 | 473 |
| 每个 SocialIQA 情境的平均对话数量 | 2.46 | 2.395 | 2.45 |
许可证
本数据集遵循 CC-BY-NC 4.0 许可证。
引用
如果使用此数据集,请引用以下论文:
@inproceedings{zhou-etal-2021-commonsense, title = "Commonsense-Focused Dialogues for Response Generation: An Empirical Study", author = "Zhou, Pei and Gopalakrishnan, Karthik and Hedayatnia, Behnam and Kim, Seokhwan and Pujara, Jay and Ren, Xiang and Liu, Yang and Hakkani-Tur, Dilek", booktitle = "Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue", year = "2021", address = "Singapore and Online", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/2109.06427" }




