IR Lab Cologne/Jena/Kassel Winter Term 2024/2025
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14254043
下载链接
链接失效反馈官方服务:
资源简介:
The Datasets for the Information Retrieval Courses in Cologne/Jena/Kassel in Winter Term 2024/2025
This repository contains resources coupled to ir_datasets and TIREx for IR courses that focus their hands-on labs on shared tasks. During the IR exercises in winter term 2023/2024, we collaboratively developed and evaluated IR systems in a shared task style setup, covering corpus creation, system development, and statistical analysis. The resulting artifacts, i.e., the documents, topics, runs, relevance judgments can be browsed at https://tira.io/task-overview/ir-lab-wise-2024. This zenodo artifact contains all of the underlying datasets used and produced during the course together with instructions on how to easily access the data using ir_datasets.
The artifact in this dataset include the following files:
subsampled-ms-marco-deep-learning-20241201-training-inputs.zip containing the training inputs, i.e., containing the document corpus and the topics.
subsampled-ms-marco-deep-learning-20241201-training-truths.zip containing the training truth to evaluate and tune systems, i.e., the topics and relevance judgments.
Accessing the Data with ir_datasets
We provide wrapper code to easily access the resources with ir_datasets:
# this loads a patched version of ir_datasets that can load resources from TIRA
from tira.third_party_integrations import ir_datasets
training_dataset = ir_datasets.load('ir-lab-wise-2024/subsampled-ms-marco-deep-learning-20241201-training')
Similarly, the same is possible with the ir_datasets integration to PyTerrier:
from tira.third_party_integrations import ensure_pyterrier_is_loaded
import pyterrier as pt
# this patches ir_datasets and loads PyTerrier so that it can load resources from TIRA and can run in the TIRA sandbox
ensure_pyterrier_is_loaded()
training_dataset = pt.datasets.get_dataset('irds:ir-lab-wise-2024/subsampled-ms-marco-deep-learning-20241201-training')
创建时间:
2025-01-26



