RLinf/Wiki-2018-Corpus
收藏Hugging Face2026-03-13 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/RLinf/Wiki-2018-Corpus
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- text-retrieval
---
# WideSeek-R1 Corpus
<div align="center">
[**🌐 Project Page**](https://wideseek-r1.github.io/) | [**📄 Paper**](https://arxiv.org/abs/2602.04634) | [**📖 Doc**](https://rlinf.readthedocs.io/en/latest/rst_source/examples/agentic/wideseek_r1/index.html) | [**💻 Code**](https://github.com/RLinf/RLinf/tree/main/examples/agent/wideseek_r1) | [**📦 Dataset**](https://huggingface.co/datasets/RLinf/WideSeek-R1-train-data) | [**🤗 Models**](https://huggingface.co/RLinf/WideSeek-R1-4b)
</div>
To train [**WideSeek-R1**](https://huggingface.co/RLinf/WideSeek-R1-4b) efficiently, we deployed a suite of local search tools that the model can use during training.
This repository includes three components:
* **wiki_corpus.jsonl**: Serves as the model’s **Search** tool. Given a query, it returns the most relevant snippets.
* **wiki_webpages.jsonl**: Serves as the model’s **Access** tool. Given a specific URL, it returns the full webpage content.
* **qdrant/**: A local **Qdrant** vector database built by embedding `wiki_corpus.jsonl`. It enables efficient retrieval and acts as the core backend for the Search tool.
Both `wiki_corpus.jsonl` and `wiki_webpages.jsonl` are sourced from the [ASearcher-Local-Knowledge](https://huggingface.co/datasets/inclusionAI/ASearcher-Local-Knowledge) dataset.
# Acknowledgement
Thanks to [**ASearcher**](https://github.com/inclusionAI/ASearcher) for providing a comprehensive, high-quality wiki corpus.
## Citation
If you use this dataset in your research, please cite our paper:
```bibtex
@article{xu2026wideseek,
title = {WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning},
author = {Xu, Zelai and Xu, Zhexuan and Zhang, Ruize and Zhu, Chunyang and Yu, Shi and Liu, Weilin and Zhang, Quanlu and Ding, Wenbo and Yu, Chao and Wang, Yu},
journal = {arXiv preprint arXiv:2602.04634},
year = {2026},
}
```
提供机构:
RLinf



