five

irds/wikiclir_it

收藏
Hugging Face2023-01-05 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/irds/wikiclir_it
下载链接
链接失效反馈
官方服务:
资源简介:
`wikiclir/it`数据集由`ir-datasets`包提供,包含文档(docs)、查询(queries)和相关性评估(qrels)三部分。具体来说,文档部分包含1,347,011条记录,查询部分包含808,605条记录,相关性评估部分包含3,443,633条记录。该数据集主要用于文本检索任务。

The `wikiclir/it` dataset is provided by the `ir-datasets` package, and it comprises three components: documents (docs), queries, and relevance judgments (qrels). Specifically, the document component contains 1,347,011 entries, the query component includes 808,605 entries, and the relevance judgment component has 3,443,633 entries. This dataset is primarily utilized for text retrieval tasks.
提供机构:
irds
原始信息汇总

数据集概述

数据集名称

wikiclir/it

数据提供者

ir-datasets 包提供。

数据内容

  • docs (文档,即语料库); 数量=1,347,011
  • queries (查询,即主题); 数量=808,605
  • qrels (相关性评估); 数量=3,443,633

数据使用示例

python from datasets import load_dataset

docs = load_dataset(irds/wikiclir_it, docs) for record in docs: record # {doc_id: ..., title: ..., text: ...}

queries = load_dataset(irds/wikiclir_it, queries) for record in queries: record # {query_id: ..., text: ...}

qrels = load_dataset(irds/wikiclir_it, qrels) for record in qrels: record # {query_id: ..., doc_id: ..., relevance: ..., iteration: ...}

引用信息

@inproceedings{sasaki-etal-2018-cross, title = "Cross-Lingual Learning-to-Rank with Shared Representations", author = "Sasaki, Shota and Sun, Shuo and Schamoni, Shigehiko and Duh, Kevin and Inui, Kentaro", booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)", month = jun, year = "2018", address = "New Orleans, Louisiana", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/N18-2073", doi = "10.18653/v1/N18-2073", pages = "458--463" }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作