irds/nyt_wksup_train
收藏Hugging Face2023-01-05 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/irds/nyt_wksup_train
下载链接
链接失效反馈官方服务:
资源简介:
`nyt/wksup/train`数据集由ir-datasets包提供,主要用于文本检索任务。该数据集包含查询(queries)和相关性评估(qrels)两部分数据,查询数量为1,863,657条,相关性评估数量也为1,863,657条。文档部分需要从`irds/nyt`数据集中获取。数据集的使用可以通过Hugging Face的`load_dataset`函数进行加载和访问。
提供机构:
irds
原始信息汇总
数据集概述
数据集名称
nyt/wksup/train
数据来源
- 源数据集:
irds/nyt
任务类别
- 文本检索
数据内容
queries: 查询(主题),数量为1,863,657qrels: 相关性评估,数量为1,863,657docs: 文档数据,需从irds/nyt获取
使用示例
python from datasets import load_dataset
queries = load_dataset(irds/nyt_wksup_train, queries) for record in queries: record # {query_id: ..., text: ...}
qrels = load_dataset(irds/nyt_wksup_train, qrels) for record in qrels: record # {query_id: ..., doc_id: ..., relevance: ...}
引用信息
@inproceedings{MacAvaney2019Wksup, author = {MacAvaney, Sean and Yates, Andrew and Hui, Kai and Frieder, Ophir}, title = {Content-Based Weak Supervision for Ad-Hoc Re-Ranking}, booktitle = {SIGIR}, year = {2019} } @article{Sandhaus2008Nyt, title={The new york times annotated corpus}, author={Sandhaus, Evan}, journal={Linguistic Data Consortium, Philadelphia}, volume={6}, number={12}, pages={e26752}, year={2008} }



