five

LUGE-Dialogue开放域对话数据集合

收藏
千言数据集2024-05-15 收录
下载链接:
https://www.luge.ai/#/luge/dataDetail?id=26
下载链接
链接失效反馈
官方服务:
资源简介:
本数据集旨在全面评测基于统一生成模型建模不同对话技能的整体效果,包括内容丰富度,多轮连贯性,知识准确率,对话主动性。 其中收集了一系列公开的开放域对话数据集,并对数据集进行了统一的整理以及提供了统一的评测方式,期望从多个技能、多个领域的角度对模型效果进行综合评价。该开源数据集旨在为研究人员和开发者提供学术和技术交流的平台,进一步提升开放域对话的研究水平,推动自然语言理解和人工智能领域技术的应用和发展。 同时,我们还收集并提供了开源的中文对话数据,参赛队可以基于这些对话数据构建自己的对话模型: 1.知识对话相关数据:百度的DuConv [1]。 2.推荐对话相关数据:百度的DuRecDial [2]。 3.画像对话数据:百度的画像数据集(DuPersona)。 4.其他对话数据:华为的微博数据 [3] ,北航和微软的豆瓣多轮对话 [4],清华的LCCC数据集[5],清华情感对话数据[6],腾讯的检索辅助生成对话数据集 [7] ,清华的KdConv [8]。

This dataset aims to comprehensively evaluate the overall performance of unified generative models in modeling various dialogue skills, including content richness, multi-turn coherence, knowledge accuracy, and dialogue initiative. It collects a series of publicly available open-domain dialogue datasets, conducts uniform curation on these datasets, and provides unified evaluation methods, with the goal of conducting comprehensive evaluations of model performance from the perspectives of multiple skills and domains. This open-source dataset is intended to provide a platform for academic and technical exchanges for researchers and developers, further elevate the research level of open-domain dialogue, and promote the application and development of technologies in the fields of natural language understanding and artificial intelligence. Meanwhile, we have also collected and provided open-source Chinese dialogue data, and participating teams can build their own dialogue models based on these datasets: 1. Knowledge dialogue-related data: DuConv from Baidu [1]. 2. Recommendation dialogue-related data: DuRecDial from Baidu [2]. 3. Persona dialogue data: Baidu's DuPersona dataset. 4. Other dialogue data: Weibo data from Huawei [3], Douban multi-turn dialogue dataset from Beihang University and Microsoft [4], LCCC dataset from Tsinghua University [5], Tsinghua emotional dialogue dataset [6], Tencent's retrieval-augmented generative dialogue dataset [7], and KdConv dataset from Tsinghua University [8].
提供机构:
百度 清华大学 澜舟科技 腾讯
搜集汇总
数据集介绍
main_image_url
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务