five

mvansegbroeck/commonsense-dialogues

收藏
Hugging Face2023-08-30 更新2024-06-15 收录
下载链接:
https://hf-mirror.com/datasets/mvansegbroeck/commonsense-dialogues
下载链接
链接失效反馈
官方服务:
资源简介:
Commonsense-Dialogues数据集是一个包含约11K条基于社交情境的对话的众包数据集,这些对话涉及社交常识的运用。每个对话由4-6轮对话组成,涉及社交情境中的个体和第三方朋友之间的交流。数据集的社交情境来源于SocialIQA数据集的训练集。数据集分为训练集、验证集和测试集,分别包含约9K、1K和1K条对话。
提供机构:
mvansegbroeck
原始信息汇总

Commonsense-Dialogues 数据集

概述

Commonsense-Dialogues 是一个众包数据集,包含约 11,000 个基于社交情境并涉及常识应用的对话。这些社交情境来源于 SocialIQA 数据集的训练集,这是一个基于多选题问答的社交常识推理基准。

数据收集

在收集 Commonsense-Dialogues 数据集时,每个参与者会得到一个社交情境,并被要求根据情境中的事件编写一个 4-6 轮的对话。参与者需要在情境中提及的个人和第三方朋友之间交替角色。

数据示例

json { "1": { "context": "Sydney met Carsons mother for the first time last week. He liked her.", "speaker": "Sydney", "turns": [ "I met Carsons mother last week for the first time.", "How was she?", "She turned out to be really nice. I like her.", "Thats good to hear.", "It is, especially since Carson and I are getting serious.", "Well, at least youll like your in-law if you guys get married." ] }, "2": { "context": "Kendall had a party at Jordans house but was found out to not have asked and just broke in.", "speaker": "Kendall", "turns": [ "Did you hear about my party this weekend at Jordanu2019s house?", "I heard it was amazing, but that you broke in.", "That was a misunderstanding, I had permission to be there.", "Who gave you permission?", "I talked to Jordan about it months ago before he left town to go to school, but he forgot to tell his roommates about it.", "Ok cool, I hope everything gets resolved." ] } }

数据分布

数据集包含在 /data 目录中,train.json 包含约 9,000 个对话,valid.jsontest.json 各包含约 1,000 个对话。所有情境均来源于 SocialIQA 的训练集,因此在进行多任务训练和评估时需谨慎,以确保公平和准确。

数据统计

统计项 训练集 验证集 测试集
对话数量 9058 1157 1158
对话平均轮数 5.72 5.72 5.71
每轮平均单词数 12.4 12.4 12.2
使用的不同 SocialIQA 情境数量 3672 483 473
每个 SocialIQA 情境的平均对话数量 2.46 2.395 2.45

许可证

本数据集遵循 CC-BY-NC 4.0 许可证。

引用

如果使用此数据集,请引用以下论文:

@inproceedings{zhou-etal-2021-commonsense, title = "Commonsense-Focused Dialogues for Response Generation: An Empirical Study", author = "Zhou, Pei and Gopalakrishnan, Karthik and Hedayatnia, Behnam and Kim, Seokhwan and Pujara, Jay and Ren, Xiang and Liu, Yang and Hakkani-Tur, Dilek", booktitle = "Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue", year = "2021", address = "Singapore and Online", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/2109.06427" }

搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
Commonsense-Dialogues是一个众包对话数据集,包含约11,000个基于社会常识的对话,每个对话由4-6轮次组成,用于文本生成和自然语言处理任务。该数据集基于SocialIQA的社会上下文构建,旨在促进常识推理和对话响应生成的研究,分为训练、验证和测试子集,格式为JSON,语言为英语。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作