grill-lab/browsecomp-plus-passage-corpus
收藏Hugging Face2026-04-08 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/grill-lab/browsecomp-plus-passage-corpus
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- text-retrieval
tags:
- retrieval-augmented-generation
- deep-research
- search
---
# Passage Corpus for the BrowseComp-Plus Dataset
This repository contains the passage corpus for the BrowseComp-Plus dataset, used in the paper [Revisiting Text Ranking in Deep Research](https://arxiv.org/abs/2602.21456), which has been accepted at **SIGIR 2026**, the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval.
**Code:** [https://github.com/ChuanMeng/text-ranking-in-deep-research](https://github.com/ChuanMeng/text-ranking-in-deep-research)
The corpus consists of 2,772,255 passages. The file format follows the Tevatron data format. Each item contains three fields: `docid`, `title`, and `text`.
- `docid` denotes the unique passage identifier.
- `title` denotes the title of the source document from which the passage is extracted.
- `text` contains the passage content.
We also provide the passage corpus in Pyserini format; see [here](https://huggingface.co/datasets/grill-lab/browsecomp-plus-passage-corpus-pyserini).
## Contact
If you have any questions or suggestions, please contact:
- [Chuan Meng](https://chuanmeng.github.io/): chuan.meng@ed.ac.uk
- [Litu Ou](https://leonard907.github.io/): litu.ou@ed.ac.uk
## Citation
If you find this work useful, please cite:
```bibtex
@inproceedings{meng2026revisiting,
title={Revisiting Text Ranking in Deep Research},
author={Meng, Chuan and Ou, Litu and MacAvaney, Sean and Dalton, Jeff},
booktitle={Proceedings of the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval},
year={2026}
}
```
提供机构:
grill-lab



