five

RAG-Instruct

收藏
魔搭社区2025-12-05 更新2025-01-25 收录
下载链接:
https://modelscope.cn/datasets/FreedomIntelligence/RAG-Instruct
下载链接
链接失效反馈
官方服务:
资源简介:
## Introduction RAG-Instruct is a RAG dataset designed to comprehensively enhance LLM RAG capabilities, synthesized using GPT-4o. This dataset is based on the Wikipedia corpus and This dataset is based on the Wikipedia corpus and offers the advantages of query-document scenario diversity and task diversity. The RAG-Instruct dataset can significantly enhance the RAG ability of LLMs and make remarkable improvements in RAG performance across various tasks. | Model | WQA (acc) | PQA (acc) | TQA (acc) | OBQA (EM) | Pub (EM) | ARC (EM) | 2WIKI (acc) | HotP (acc) | MSQ (acc) | CFQA (EM) | PubMed (EM) | |--------------------------------|-----------|-----------|-----------|-----------|----------|----------|-------------|------------|-----------|-----------|-------------| | Llama3.2-3B | 58.7 | 61.8 | 69.7 | 77.0 | 55.0 | 66.8 | 55.6 | 40.2 | 13.2 | 46.8 | 70.3 | | Llama3.1-8B | 59.5 | 60.8 | 73.4 | 82.0 | 56.7 | 77.1 | 65.6 | 45.6 | 18.7 | 56.5 | 73.9 | | Llama3.2-3B + RAG-Instruct | 65.3 | 64.0 | 77.0 | 81.2 | 66.4 | 73.0 | 72.9 | 52.7 | 25.0 | 50.3 | 72.6 | | Llama3.1-8B + RAG-Instruct | 69.7 | 68.4 | 79.3 | 84.8 | 77.2 | 79.9 | 79.3 | 56.4 | 30.3 | 57.8 | 77.0 | For details, see our [paper](https://arxiv.org/abs/2501.00353) and [GitHub repository](https://github.com/FreedomIntelligence/RAG-Instruct). ## Citation If you find our data useful, please consider citing our work! ``` @misc{liu2024raginstructboostingllmsdiverse, title={RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented Instructions}, author={Wanlong Liu and Junying Chen and Ke Ji and Li Zhou and Wenyu Chen and Benyou Wang}, year={2024}, eprint={2501.00353}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2501.00353}, } ```

# 简介 RAG-Instruct是一款专为全面提升大语言模型(Large Language Model, LLM)检索增强生成(Retrieval-Augmented Generation, RAG)能力而设计的数据集,由GPT-4o合成生成。该数据集基于维基百科语料库,兼具查询-文档场景多样化与任务类型多样化的优势。 本数据集可显著强化大语言模型的检索增强生成能力,并在各类任务的检索增强生成性能上实现显著提升。 | 模型 | WQA (准确率) | PQA (准确率) | TQA (准确率) | OBQA (精确匹配率) | Pub (精确匹配率) | ARC (精确匹配率) | 2WIKI (准确率) | HotP (准确率) | MSQ (准确率) | CFQA (精确匹配率) | PubMed (精确匹配率) | |--------------------------------|-----------|-----------|-----------|-----------|----------|----------|-------------|------------|-----------|-----------|-------------| | Llama3.2-3B | 58.7 | 61.8 | 69.7 | 77.0 | 55.0 | 66.8 | 55.6 | 40.2 | 13.2 | 46.8 | 70.3 | | Llama3.1-8B | 59.5 | 60.8 | 73.4 | 82.0 | 56.7 | 77.1 | 65.6 | 45.6 | 18.7 | 56.5 | 73.9 | | Llama3.2-3B + RAG-Instruct | 65.3 | 64.0 | 77.0 | 81.2 | 66.4 | 73.0 | 72.9 | 52.7 | 25.0 | 50.3 | 72.6 | | Llama3.1-8B + RAG-Instruct | 69.7 | 68.4 | 79.3 | 84.8 | 77.2 | 79.9 | 79.3 | 56.4 | 30.3 | 57.8 | 77.0 | 如需了解更多细节,请参阅我们的[论文](https://arxiv.org/abs/2501.00353)与[GitHub仓库](https://github.com/FreedomIntelligence/RAG-Instruct)。 ## 引用 若您认为本数据集对您的研究有所帮助,请考虑引用我们的工作! @misc{liu2024raginstructboostingllmsdiverse, title={RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented Instructions}, author={Wanlong Liu and Junying Chen and Ke Ji and Li Zhou and Wenyu Chen and Benyou Wang}, year={2024}, eprint={2501.00353}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2501.00353}, }
提供机构:
maas
创建时间:
2025-01-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作