CGoDial
收藏arXiv2022-11-22 更新2024-06-21 收录
下载链接:
https://github.com/Alibaba Research/DAMO-ConvAI/cgodial
下载链接
链接失效反馈官方服务:
资源简介:
CGoDial是一个大规模的中文面向目标对话评估基准,包含96,763个对话会话,总计574,949个对话轮次,覆盖三个不同知识源的数据集。该数据集通过众包方式收集真实对话数据或添加口语特征,以缩小学术基准与口语对话场景之间的差距。CGoDial旨在评估模型在实际情况下对通用预测、快速适应性和可靠鲁棒性的能力,适用于多种中文预训练语言模型的对话建模。
CGoDial is a large-scale Chinese goal-oriented dialogue evaluation benchmark. It consists of 96,763 dialogue sessions and a total of 574,949 dialogue turns, covering datasets from three distinct knowledge sources. This dataset collects real dialogue data or adds spoken language features via crowdsourcing, aiming to narrow the gap between academic benchmarks and real-world spoken dialogue scenarios. CGoDial is designed to evaluate a model's capabilities in general prediction, fast adaptability and reliable robustness under real-world circumstances, and is suitable for dialogue modeling of various Chinese pre-trained language models.
提供机构:
阿里巴巴集团
创建时间:
2022-11-22
搜集汇总
数据集介绍

背景与挑战
背景概述
CGoDial是一个大规模中文面向目标对话评估基准,包含超过96,000个对话会话和57万多个对话轮次,覆盖三个知识源,通过众包收集真实对话数据以模拟口语场景。该数据集旨在评估模型在通用预测、快速适应性和鲁棒性方面的能力,适用于中文预训练语言模型的对话建模研究。
以上内容由遇见数据集搜集并总结生成



