ConvAI2 (Conversational Intelligence Challenge 2)
收藏OpenDataLab2026-05-24 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/ConvAI2
下载链接
链接失效反馈官方服务:
资源简介:
ConvAI2 NeurIPS 竞赛旨在寻找方法来创建能够进行有意义的开放域对话的高质量对话代理。用于训练模型的 ConvAI2 数据集基于 PERSONA-CHAT 数据集。每个说话者对都分配了来自一组 1155 个可能的角色(在训练时)的配置文件,每个角色至少包含 5 个配置文件句子,留出 100 个以前从未见过的角色进行验证。随着最初的 PERSONA-CHAT 测试集发布,一个新的隐藏测试集由 100 个新角色和超过 1,015 个对话由众包工作者创建。为了避免利用琐碎的单词重叠进行建模,对相同的训练和测试角色的额外重写集进行了众包,相关的句子是改写、概括或专业化,使任务更具挑战性。例如,“我刚做完指甲”被修改为“我喜欢定期宠爱自己”,“我现在正在节食”被修改为“我需要减肥”。训练集、验证集和隐藏测试集分别由 17,878、1,000 和 1,015 个对话组成。
The ConvAI2 NeurIPS Competition aims to develop methods for constructing high-quality conversational agents capable of engaging in meaningful open-domain dialogues. The ConvAI2 dataset used for model training is based on the PERSONA-CHAT dataset. Each speaker pair is assigned a profile from a pool of 1155 possible roles during training, where each role contains at least 5 profile sentences, and 100 previously unseen roles are reserved for validation. Alongside the release of the original PERSONA-CHAT test set, a new hidden test set was created by crowdworkers, comprising 100 new roles and over 1,015 dialogues. To prevent models from exploiting trivial word overlaps for prediction, additional rewritten datasets for the same training and test roles were crowdsourced, where relevant sentences were paraphrased, generalized, or specialized to enhance the task's difficulty. For example, "I just got my nails done" was revised to "I love pampering myself regularly", and "I'm on a diet right now" was revised to "I need to lose weight". The training, validation, and hidden test sets consist of 17,878, 1,000, and 1,015 dialogues respectively.
提供机构:
OpenDataLab
创建时间:
2022-08-16
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



