five

mRobust04

收藏
arXiv2022-09-28 更新2024-06-21 收录
下载链接:
https://huggingface.co/datasets/unicamp-dl/mrobust
下载链接
链接失效反馈
官方服务:
资源简介:
mRobust04是由巴西神经网络实验室和圣保罗大学电气与计算机工程学院创建的多语言信息检索评估数据集,基于TREC Robust 2004基准,通过Google Translate翻译成8种语言。该数据集包含528000条记录,具有每查询多个判断的特点,旨在解决多语言信息检索中判断稀疏的问题。创建过程中,原始英文数据集的查询和文档通过Google Translate API翻译,部分超长文档被分割成小块进行独立翻译后再合并。mRobust04适用于评估多语言检索模型,特别是在处理密集标注的数据集时,能有效提升检索方法的评估准确性。

mRobust04 is a multilingual information retrieval evaluation dataset created by the Brazilian Neural Network Laboratory and the School of Electrical and Computer Engineering of the University of São Paulo. Based on the TREC Robust 2004 benchmark, it was translated into 8 languages via Google Translate. This dataset contains 528,000 records and features multiple judgments per query, aiming to address the issue of sparse judgments in multilingual information retrieval. During its creation, the queries and documents from the original English dataset were translated using the Google Translate API; for some extremely long documents, they were split into chunks for independent translation before being merged. mRobust04 is applicable for evaluating multilingual retrieval models, and can effectively improve the evaluation accuracy of retrieval methods, especially when dealing with densely annotated datasets.
提供机构:
巴西神经网络实验室,巴西圣保罗大学电气与计算机工程学院
创建时间:
2022-09-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作