MLQuestions
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/mcgill-nlp/mlquestions
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了35,000条未对齐的问题、50,000条未对齐的段落,以及3,000条已对齐的问题-段落对,专为促进域适应方法的评估而设计,尤其是对比回训练和自训练方法。该数据集的任务是针对问题生成和段落检索的无监督域适应。
This dataset comprises 35,000 unaligned questions, 50,000 unaligned paragraphs, and 3,000 aligned question-paragraph pairs. It is specifically designed to facilitate the evaluation of domain adaptation approaches, particularly contrastive retraining and self-training methods. The core task of this dataset is unsupervised domain adaptation for question generation and paragraph retrieval.



