five

ReaRAG-20k

收藏
魔搭社区2025-12-02 更新2025-07-19 收录
下载链接:
https://modelscope.cn/datasets/THU-KEG/ReaRAG-20k
下载链接
链接失效反馈
官方服务:
资源简介:
# 📘 Dataset Card for ReaRAG-20k <p align="center"> 🤗 <a href="https://huggingface.co/THU-KEG/ReaRAG-9B" target="_blank">Model</a> • 💻 <a href="https://github.com/THU-KEG/ReaRAG" target="_blank">GitHub</a> • 📃 <a href="https://arxiv.org/abs/2503.21729" target="_blank">Paper</a> </p> ReaRAG-20k is a reasoning-focused dataset designed for training the ReaRAG model. It contains approximately 20,000 multi-turn retrieval examples constructed from the QA datasets such as HotpotQA, MuSiQue, and Natural Questions (NQ). Each instance follows a conversational format supporting reasoning and retrieval steps: ```json { "messages": [{"role": "user", "content": "..."}, {"role": "assistant", "reasoning": "..."}, {"role": "observation", "content": "..."}, ...] } ``` During sft, the loss is computed only on messages that contain the `reasoning` key, rather than the `content` key. # 🔗 Resources - **Code Repository:** [💻 GitHub](https://github.com/THU-KEG/ReaRAG) - **Paper:** [📃 ArXiv](https://arxiv.org/abs/2503.21729) - **Model:** [🤗 Huggingface](https://huggingface.co/THU-KEG/ReaRAG-9B). A model based on GLM-4-9B, sft on this dataset. # 📚 Citation If you use this dataset in your research or projects, please consider citing our work: ``` @article{lee2025rearag, title={ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation}, author={Lee, Zhicheng and Cao, Shulin and Liu, Jinxin and Zhang, Jiajie and Liu, Weichuan and Che, Xiaoyin and Hou, Lei and Li, Juanzi}, journal={arXiv preprint arXiv:2503.21729}, year={2025} } ```

# 📘 ReaRAG-20k 数据集卡片 <p align="center"> 🤗 <a href="https://huggingface.co/THU-KEG/ReaRAG-9B" target="_blank">模型</a> • 💻 <a href="https://github.com/THU-KEG/ReaRAG" target="_blank">GitHub仓库</a> • 📃 <a href="https://arxiv.org/abs/2503.21729" target="_blank">研究论文</a> </p> ReaRAG-20k是一款聚焦推理任务的专用数据集,旨在为ReaRAG模型的训练提供支撑。该数据集包含约20000条多轮检索示例,其构建数据源涵盖HotpotQA、MuSiQue以及Natural Questions(NQ)等主流问答数据集。 每个数据实例均采用适配推理与检索流程的对话格式,具体结构如下: json { "messages": [{"role": "user", "content": "..."}, {"role": "assistant", "reasoning": "..."}, {"role": "observation", "content": "..."}, ...] } 在监督微调(Supervised Fine-Tuning,SFT)阶段,仅对带有`reasoning`字段的对话消息计算损失,而非带有`content`字段的消息。 # 🔗 相关资源 - **代码仓库:** [💻 GitHub](https://github.com/THU-KEG/ReaRAG) - **研究论文:** [📃 ArXiv](https://arxiv.org/abs/2503.21729) - **预训练模型:** [🤗 Hugging Face](https://huggingface.co/THU-KEG/ReaRAG-9B)。该模型基于GLM-4-9B构建,并使用本数据集完成监督微调。 # 📚 引用规范 若您在研究工作或实际项目中使用本数据集,请引用如下文献: @article{lee2025rearag, title={ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation}, author={Lee, Zhicheng and Cao, Shulin and Liu, Jinxin and Zhang, Jiajie and Liu, Weichuan and Che, Xiaoyin and Hou, Lei and Li, Juanzi}, journal={arXiv preprint arXiv:2503.21729}, year={2025} }
提供机构:
maas
创建时间:
2025-07-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作