Topic-MSMARCO
收藏arXiv2023-08-16 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2308.08378v1
下载链接
链接失效反馈官方服务:
资源简介:
Topic-MSMARCO数据集是基于MSMARCO的子集,专门设计用于评估持续信息检索任务。该数据集包含6个任务,每个任务代表一个不同的主题,如IT、家具、食品等。数据集通过使用word2vector模型和KMeans算法对查询文本进行主题聚类创建。Topic-MSMARCO旨在解决持续学习中的灾难性遗忘问题,特别是在神经信息检索模型中,通过模拟真实世界数据的新信息不断涌现的情况。
Topic-MSMARCO is a subset derived from MSMARCO, specifically developed for evaluating continuous information retrieval tasks. This dataset comprises 6 tasks, each representing a distinct topic such as IT, furniture, food, and so on. It is constructed via topic clustering on query texts using the Word2Vec model and K-means algorithm. Topic-MSMARCO aims to address the catastrophic forgetting problem in continual learning, particularly for neural information retrieval models, by simulating the scenario where new information continuously emerges in real-world data.
提供机构:
拉夫堡大学科学学院计算机科学系
创建时间:
2023-08-16



