five

RLinf/Wiki-2018-Corpus

收藏
Hugging Face2026-03-13 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/RLinf/Wiki-2018-Corpus
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-retrieval --- # WideSeek-R1 Corpus <div align="center"> [**🌐 Project Page**](https://wideseek-r1.github.io/) | [**📄 Paper**](https://arxiv.org/abs/2602.04634) | [**📖 Doc**](https://rlinf.readthedocs.io/en/latest/rst_source/examples/agentic/wideseek_r1/index.html) | [**💻 Code**](https://github.com/RLinf/RLinf/tree/main/examples/agent/wideseek_r1) | [**📦 Dataset**](https://huggingface.co/datasets/RLinf/WideSeek-R1-train-data) | [**🤗 Models**](https://huggingface.co/RLinf/WideSeek-R1-4b) </div> To train [**WideSeek-R1**](https://huggingface.co/RLinf/WideSeek-R1-4b) efficiently, we deployed a suite of local search tools that the model can use during training. This repository includes three components: * **wiki_corpus.jsonl**: Serves as the model’s **Search** tool. Given a query, it returns the most relevant snippets. * **wiki_webpages.jsonl**: Serves as the model’s **Access** tool. Given a specific URL, it returns the full webpage content. * **qdrant/**: A local **Qdrant** vector database built by embedding `wiki_corpus.jsonl`. It enables efficient retrieval and acts as the core backend for the Search tool. Both `wiki_corpus.jsonl` and `wiki_webpages.jsonl` are sourced from the [ASearcher-Local-Knowledge](https://huggingface.co/datasets/inclusionAI/ASearcher-Local-Knowledge) dataset. # Acknowledgement Thanks to [**ASearcher**](https://github.com/inclusionAI/ASearcher) for providing a comprehensive, high-quality wiki corpus. ## Citation If you use this dataset in your research, please cite our paper: ```bibtex @article{xu2026wideseek, title = {WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning}, author = {Xu, Zelai and Xu, Zhexuan and Zhang, Ruize and Zhu, Chunyang and Yu, Shi and Liu, Weilin and Zhang, Quanlu and Ding, Wenbo and Yu, Chao and Wang, Yu}, journal = {arXiv preprint arXiv:2602.04634}, year = {2026}, } ```
提供机构:
RLinf
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作