MSDialog
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/NTMC-Community/MatchZoo
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为MSDialog,来源于微软产品在线论坛,覆盖了多个领域的讨论内容。它包含了大量的对话和发言,专门为评估对话响应排序任务而设计。在数据集中,允许进行成对排序设置,每个真实响应都会与九个候选响应进行比较。对话的最大回合数为十轮,对发言的序列长度和领域信息有特定的限制。数据集规模宏大,包含超过35,000个对话和337,000条发言;其中训练集有超过173,000个样本,验证集有37,000个,测试集有35,000个。该数据集的任务是对话中的响应排序。
The dataset is named MSDialog, which is sourced from Microsoft product online forums and covers discussion content across multiple domains. It contains a large number of conversations and utterances, and is specifically designed for evaluating the dialogue response ranking task. The dataset supports pairwise ranking settings, where each ground-truth response is compared with nine candidate responses. The maximum number of dialogue turns is ten, with specific restrictions on the sequence length of utterances and domain information. The dataset has a large scale, comprising over 35,000 conversations and 337,000 utterances: the training set contains more than 173,000 samples, the validation set has 37,000 samples, and the test set includes 35,000 samples. The task targeted by this dataset is dialogue response ranking.
提供机构:
Microsoft
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



