five

Data for: CLAIRE: A combinatorial visual analytics system for information retrieval evaluation

收藏
Mendeley Data2024-06-25 更新2024-06-26 收录
下载链接:
https://data.mendeley.com/datasets/mdwvttzt48
下载链接
链接失效反馈
官方服务:
资源简介:
We considered the following standard and shared collec- tions, each track using 50 different topics: • TREC Adhoc tracks T07 and T08: they focus on a news search task and adopt a corpus of about 528K news documents. • TREC Web tracks T09 and T10: focus on a Web search task and adopt a corpus of 1.7M Web pages. • TREC Terabyte tracks T14 and T15: focus on a Web search task and adopt a corpus of 125M Web pages. We considered three main components of an IR system: stop list, stemmer, and IR model. We selected a set of alternative implementations of each component and, by using the Ter- rier v.4.02 open source system, we created a run for each system defined by combining the available components in all possible ways. The selected components are: • Stop list: nostop, indri, lucene, snowball, smart, terrier; • Stemmer: nolug, weakPorter, porter, snowballPorter, krovetz, lovins; • Model: bb2, bm25, dfiz, dfree, dirichletlm, dlh, dph, hiemstralm, ifb2, inb2, inl2, inexpb2, jskls, lemurtfidf, lgd, pl2, tfidf. Overall, these components define a 6 × 6 × 17 factorial design with a GoP consisting of 612 system runs. They represent nearly all the state-of-the-art components which constitute the common denominator almost always present in any IR system for English retrieval and thus they are a good account of what can be found in many different operational settings.
创建时间:
2024-01-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作