DiscoFuse
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/google-research-datasets/discofuse
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含450万个维基百科示例和1200万个体育相关示例,划分为98%的训练集、1%的测试集和1%的开发集。该数据集被用于评估融合模型在领域内和跨领域的表现。作为一个大规模数据集,它总共包含1650万个示例,所涉及的任务是句子融合。
This dataset comprises 4.5 million Wikipedia examples and 12 million sports-related examples, and is split into 98% training set, 1% test set, and 1% development set. It is employed to evaluate the performance of fusion models in both in-domain and cross-domain scenarios. As a large-scale dataset, it contains a total of 16.5 million examples, with the underlying task being sentence fusion.



