dadakid/msmarco
收藏Hugging Face2026-04-23 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/dadakid/msmarco
下载链接
链接失效反馈官方服务:
资源简介:
BEIR是一个异构基准,由代表9个信息检索任务的18个不同数据集构建而成。该`msmarco`子集是BEIR的一部分。所有任务均为英语(`en`)。数据集使用标准的BEIR检索布局,包括:`corpus`(每行一个文档,包含`_id`、`title`、`text`)和`queries`(每行一个查询,包含`_id`、`title`、`text`)。数据字段包括`_id`(唯一标识符)、`title`(标题,不可用时为空字符串)和`text`(文档/查询文本)。
BEIR is a heterogeneous benchmark built from 18 diverse datasets representing 9 information retrieval tasks. This `msmarco` subset is part of BEIR. All tasks are in English (`en`). The dataset uses the standard BEIR retrieval layout and includes: `corpus` (one row per document with `_id`, `title`, `text`) and `queries` (one row per query with `_id`, `title`, `text`). Data fields include `_id` (unique identifier), `title` (title, empty string when unavailable), and `text` (document/query text).
提供机构:
dadakid



