five

IR Lab Cologne/Jena/Kassel Winter Term 2024/2025

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14254043
下载链接
链接失效反馈
官方服务:
资源简介:
The Datasets for the Information Retrieval Courses in Cologne/Jena/Kassel in Winter Term 2024/2025 This repository contains resources coupled to ir_datasets and TIREx for IR courses that focus their hands-on labs on shared tasks. During the IR exercises in winter term 2023/2024, we collaboratively developed and evaluated IR systems in a shared task style setup, covering corpus creation, system development, and statistical analysis. The resulting artifacts, i.e., the documents, topics, runs, relevance judgments can be browsed at https://tira.io/task-overview/ir-lab-wise-2024. This zenodo artifact contains all of the underlying datasets used and produced during the course together with instructions on how to easily access the data using ir_datasets.   The artifact in this dataset include the following files: subsampled-ms-marco-deep-learning-20241201-training-inputs.zip containing the training inputs, i.e., containing the document corpus and the topics. subsampled-ms-marco-deep-learning-20241201-training-truths.zip containing the training truth to evaluate and tune systems, i.e., the topics and relevance judgments. Accessing the Data with ir_datasets We provide wrapper code to easily access the resources with ir_datasets: # this loads a patched version of ir_datasets that can load resources from TIRA from tira.third_party_integrations import ir_datasets training_dataset = ir_datasets.load('ir-lab-wise-2024/subsampled-ms-marco-deep-learning-20241201-training') Similarly, the same is possible with the ir_datasets integration to PyTerrier: from tira.third_party_integrations import ensure_pyterrier_is_loaded import pyterrier as pt # this patches ir_datasets and loads PyTerrier so that it can load resources from TIRA and can run in the TIRA sandbox ensure_pyterrier_is_loaded() training_dataset = pt.datasets.get_dataset('irds:ir-lab-wise-2024/subsampled-ms-marco-deep-learning-20241201-training')
创建时间:
2025-01-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作