five

UQV100: An IR Test Collection With Query Variability

收藏
DataCite Commons2025-05-01 更新2025-04-17 收录
下载链接:
https://melbourne.figshare.com/articles/dataset/UQV100_An_IR_Test_Collection_With_Query_Variability/3180694
下载链接
链接失效反馈
官方服务:
资源简介:
<b>Abstract from the SIGIR 2016 short paper (DOI below)</b>: We describe the UQV100 test collection, designed to incorporate variability from users. Information need “backstories” were written for 100 topics (or sub-topics) from the TREC 2013 and 2014 Web Tracks. Crowd workers were asked to read the backstories, and provide the queries they would use; plus effort estimates of how many useful documents they would have to read to satisfy the need. A total of 10,835 queries were collected from 263 workers. After normalization and spell-correction, 5,764 unique variations remained; these were then used to construct a document pool via Indri-BM25 over the ClueWeb12 corpus. Qualified crowd workers made relevance judgments relative to the backstories, using a relevance scale similar to the original TREC approach; first to a pool depth of ten per query, then deeper on a set of targeted documents. The backstories, query variations, normalized and spell-corrected queries, effort estimates, run outputs, and relevance judgments are made available collectively as the UQV100 test collection. We also make available the judging guidelines and the gold hits we used for crowd-worker qualification and spam detection. We believe this test collection will unlock new opportunities for novel investigations and analysis, including for problems such as task-intent retrieval performance and consistency (independent of query variation), query clustering, query difficulty prediction, and relevance feedback, among others.<br><b>Files</b>: Download uqv100-allfiles.zip to get all of the files available as part of this collection, including README.txt. <br><b>Citation</b>: Please cite the paper linked below if you make use of the collection.<br><b>Authors</b>: Peter Bailey (Microsoft), Alistair Moffat (The University of Melbourne), Falk Scholer (RMIT University), Paul Thomas (Microsoft).<br><br>
提供机构:
University of Melbourne
创建时间:
2016-05-11
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
UQV100是一个专注于查询变体研究的信息检索测试集,包含100个主题的10,835个原始查询和5,764个标准化后的唯一查询变体。该数据集基于ClueWeb12语料库构建,提供相关性判断和查询难度估计,适用于信息检索性能、查询聚类和难度预测等研究。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作