irds/wikiclir_sw

Name: irds/wikiclir_sw
Creator: irds
Published: 2023-01-05 04:00:32
License: 暂无描述

Hugging Face2023-01-05 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/irds/wikiclir_sw

下载链接

链接失效反馈

官方服务：

资源简介：

`wikiclir/sw`数据集由`ir-datasets`包提供，主要用于文本检索任务。数据集包含三个主要部分：文档（docs）、查询（queries）和相关性评估（qrels）。文档部分包含37,079条记录，查询部分包含22,860条记录，相关性评估部分包含57,924条记录。

The `wikiclir/sw` dataset is provided via the `ir-datasets` package and is primarily used for text retrieval tasks. It consists of three core components: documents (docs), queries, and relevance judgments (qrels). The document component contains 37,079 records, the query component includes 22,860 records, and the relevance judgment component has 57,924 records.

提供机构：

irds

原始信息汇总

数据集概述

数据集名称

wikiclir/sw

数据集来源

由ir-datasets包提供。

数据集内容

docs (文档，即语料库); 数量=37,079
queries (查询，即主题); 数量=22,860
qrels (相关性评估); 数量=57,924

数据集使用示例

python from datasets import load_dataset

docs = load_dataset(irds/wikiclir_sw, docs) for record in docs: record # {doc_id: ..., title: ..., text: ...}

queries = load_dataset(irds/wikiclir_sw, queries) for record in queries: record # {query_id: ..., text: ...}

qrels = load_dataset(irds/wikiclir_sw, qrels) for record in qrels: record # {query_id: ..., doc_id: ..., relevance: ..., iteration: ...}

引用信息

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }

5,000+

优质数据集

54 个

任务类型

进入经典数据集