irds/nyt_wksup
收藏Hugging Face2023-01-05 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/irds/nyt_wksup
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: '`nyt/wksup`'
viewer: false
source_datasets: ['irds/nyt']
task_categories:
- text-retrieval
---
# Dataset Card for `nyt/wksup`
The `nyt/wksup` dataset, provided by the [ir-datasets](https://ir-datasets.com/) package.
For more information about the dataset, see the [documentation](https://ir-datasets.com/nyt#nyt/wksup).
# Data
This dataset provides:
- `queries` (i.e., topics); count=1,864,661
- `qrels`: (relevance assessments); count=1,864,661
- For `docs`, use [`irds/nyt`](https://huggingface.co/datasets/irds/nyt)
## Usage
```python
from datasets import load_dataset
queries = load_dataset('irds/nyt_wksup', 'queries')
for record in queries:
record # {'query_id': ..., 'text': ...}
qrels = load_dataset('irds/nyt_wksup', 'qrels')
for record in qrels:
record # {'query_id': ..., 'doc_id': ..., 'relevance': ...}
```
Note that calling `load_dataset` will download the dataset (or provide access instructions when it's not public) and make a copy of the
data in 🤗 Dataset format.
## Citation Information
```
@inproceedings{MacAvaney2019Wksup,
author = {MacAvaney, Sean and Yates, Andrew and Hui, Kai and Frieder, Ophir},
title = {Content-Based Weak Supervision for Ad-Hoc Re-Ranking},
booktitle = {SIGIR},
year = {2019}
}
@article{Sandhaus2008Nyt,
title={The new york times annotated corpus},
author={Sandhaus, Evan},
journal={Linguistic Data Consortium, Philadelphia},
volume={6},
number={12},
pages={e26752},
year={2008}
}
```
提供机构:
irds
原始信息汇总
数据集概述
数据集名称
nyt/wksup
数据来源
- 原始数据集:
irds/nyt
数据内容
queries(查询主题):数量为1,864,661qrels(相关性评估):数量为1,864,661docs(文档):使用irds/nyt数据集
数据使用示例
python from datasets import load_dataset
queries = load_dataset(irds/nyt_wksup, queries) for record in queries: record # {query_id: ..., text: ...}
qrels = load_dataset(irds/nyt_wksup, qrels) for record in qrels: record # {query_id: ..., doc_id: ..., relevance: ...}
引用信息
@inproceedings{MacAvaney2019Wksup, author = {MacAvaney, Sean and Yates, Andrew and Hui, Kai and Frieder, Ophir}, title = {Content-Based Weak Supervision for Ad-Hoc Re-Ranking}, booktitle = {SIGIR}, year = {2019} } @article{Sandhaus2008Nyt, title={The new york times annotated corpus}, author={Sandhaus, Evan}, journal={Linguistic Data Consortium, Philadelphia}, volume={6}, number={12}, pages={e26752}, year={2008} }



