---
pretty_name: '`msmarco-document-v2/trec-dl-2019`'
viewer: false
source_datasets: ['irds/msmarco-document-v2']
task_categories:
- text-retrieval
---
# Dataset Card for `msmarco-document-v2/trec-dl-2019`
The `msmarco-document-v2/trec-dl-2019` dataset, provided by the [ir-datasets](https://ir-datasets.com/) package.
For more information about the dataset, see the [documentation](https://ir-datasets.com/msmarco-document-v2#msmarco-document-v2/trec-dl-2019).
# Data
This dataset provides:
- `queries` (i.e., topics); count=200
- `qrels`: (relevance assessments); count=13,940
- For `docs`, use [`irds/msmarco-document-v2`](https://huggingface.co/datasets/irds/msmarco-document-v2)
This dataset is used by: [`msmarco-document-v2_trec-dl-2019_judged`](https://huggingface.co/datasets/irds/msmarco-document-v2_trec-dl-2019_judged)
## Usage
```python
from datasets import load_dataset
queries = load_dataset('irds/msmarco-document-v2_trec-dl-2019', 'queries')
for record in queries:
record # {'query_id': ..., 'text': ...}
qrels = load_dataset('irds/msmarco-document-v2_trec-dl-2019', 'qrels')
for record in qrels:
record # {'query_id': ..., 'doc_id': ..., 'relevance': ..., 'iteration': ...}
```
Note that calling `load_dataset` will download the dataset (or provide access instructions when it's not public) and make a copy of the
data in 🤗 Dataset format.
## Citation Information
```
@inproceedings{Craswell2019TrecDl,
title={Overview of the TREC 2019 deep learning track},
author={Nick Craswell and Bhaskar Mitra and Emine Yilmaz and Daniel Campos and Ellen Voorhees},
booktitle={TREC 2019},
year={2019}
}
@inproceedings{Bajaj2016Msmarco,
title={MS MARCO: A Human Generated MAchine Reading COmprehension Dataset},
author={Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, Tong Wang},
booktitle={InCoCo@NIPS},
year={2016}
}
```
yaml
pretty_name: '`msmarco-document-v2/trec-dl-2019`'
viewer: false
source_datasets: ['irds/msmarco-document-v2']
task_categories:
- 文本检索
# 数据集卡片:`msmarco-document-v2/trec-dl-2019`
本`msmarco-document-v2/trec-dl-2019`数据集由[ir-datasets](https://ir-datasets.com/)工具包发布。如需了解该数据集的更多详情,请参阅[官方文档](https://ir-datasets.com/msmarco-document-v2#msmarco-document-v2/trec-dl-2019)。
# 数据
本数据集包含以下内容:
- `queries`(即查询主题):共计200条
- `qrels`(相关性标注数据):共计13,940条
- 如需获取文档`docs`,请使用 [`irds/msmarco-document-v2`](https://huggingface.co/datasets/irds/msmarco-document-v2) 加载。
本数据集已被 [`msmarco-document-v2_trec-dl-2019_judged`](https://huggingface.co/datasets/irds/msmarco-document-v2_trec-dl-2019_judged) 所使用。
## 使用方法
python
from datasets import load_dataset
# 加载查询数据集
queries = load_dataset('irds/msmarco-document-v2_trec-dl-2019', 'queries')
for record in queries:
record # 格式为 {'query_id': ..., 'text': ...}
# 加载相关性标注数据集
qrels = load_dataset('irds/msmarco-document-v2_trec-dl-2019', 'qrels')
for record in qrels:
record # 格式为 {'query_id': ..., 'doc_id': ..., 'relevance': ..., 'iteration': ...}
注:调用`load_dataset`将自动下载该数据集(若数据集未对外开放,则会提供访问指引),并将其转换为🤗 Dataset格式。
## 引用信息
bibtex
@inproceedings{Craswell2019TrecDl,
title={Overview of the TREC 2019 deep learning track},
author={Nick Craswell and Bhaskar Mitra and Emine Yilmaz and Daniel Campos and Ellen Voorhees},
booktitle={TREC 2019},
year={2019}
}
@inproceedings{Bajaj2016Msmarco,
title={MS MARCO: A Human Generated MAchine Reading COmprehension Dataset},
author={Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, Tong Wang},
booktitle={InCoCo@NIPS},
year={2016}
}