Twitter Dialog Corpus
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/Marsan-Ma-zz/chat_corpus
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是从推特上收集的,包含了2.6百万条(消息,回复)对,经过特定长度限制的筛选后,保留了2百万条样本。此外,数据集还根据历史消息和回复的长度限制进行了进一步筛选。规模达到了2百万样本,适用于开放域对话生成任务。
This dataset is collected from Twitter, initially containing 2.6 million (message, reply) pairs. After filtering based on specific length constraints, 2 million samples are retained. Furthermore, the dataset undergoes additional filtering according to the length limits of historical messages and their corresponding replies. With a total of 2 million samples, this dataset is suitable for open-domain dialogue generation tasks.
提供机构:
Marsan-Ma-zz



