five

grill-lab/browsecomp-plus-passage-corpus

收藏
Hugging Face2026-04-08 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/grill-lab/browsecomp-plus-passage-corpus
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - text-retrieval tags: - retrieval-augmented-generation - deep-research - search --- # Passage Corpus for the BrowseComp-Plus Dataset This repository contains the passage corpus for the BrowseComp-Plus dataset, used in the paper [Revisiting Text Ranking in Deep Research](https://arxiv.org/abs/2602.21456), which has been accepted at **SIGIR 2026**, the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval. **Code:** [https://github.com/ChuanMeng/text-ranking-in-deep-research](https://github.com/ChuanMeng/text-ranking-in-deep-research) The corpus consists of 2,772,255 passages. The file format follows the Tevatron data format. Each item contains three fields: `docid`, `title`, and `text`. - `docid` denotes the unique passage identifier. - `title` denotes the title of the source document from which the passage is extracted. - `text` contains the passage content. We also provide the passage corpus in Pyserini format; see [here](https://huggingface.co/datasets/grill-lab/browsecomp-plus-passage-corpus-pyserini). ## Contact If you have any questions or suggestions, please contact: - [Chuan Meng](https://chuanmeng.github.io/): chuan.meng@ed.ac.uk - [Litu Ou](https://leonard907.github.io/): litu.ou@ed.ac.uk ## Citation If you find this work useful, please cite: ```bibtex @inproceedings{meng2026revisiting, title={Revisiting Text Ranking in Deep Research}, author={Meng, Chuan and Ou, Litu and MacAvaney, Sean and Dalton, Jeff}, booktitle={Proceedings of the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval}, year={2026} } ```
提供机构:
grill-lab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作