RAG-Instruct
收藏魔搭社区2025-12-05 更新2025-01-25 收录
下载链接:
https://modelscope.cn/datasets/FreedomIntelligence/RAG-Instruct
下载链接
链接失效反馈官方服务:
资源简介:
## Introduction
RAG-Instruct is a RAG dataset designed to comprehensively enhance LLM RAG capabilities, synthesized using GPT-4o. This dataset is based on the Wikipedia corpus and This dataset is based on the Wikipedia corpus and offers the advantages of query-document scenario diversity and task diversity.
The RAG-Instruct dataset can significantly enhance the RAG ability of LLMs and make remarkable improvements in RAG performance across various tasks.
| Model | WQA (acc) | PQA (acc) | TQA (acc) | OBQA (EM) | Pub (EM) | ARC (EM) | 2WIKI (acc) | HotP (acc) | MSQ (acc) | CFQA (EM) | PubMed (EM) |
|--------------------------------|-----------|-----------|-----------|-----------|----------|----------|-------------|------------|-----------|-----------|-------------|
| Llama3.2-3B | 58.7 | 61.8 | 69.7 | 77.0 | 55.0 | 66.8 | 55.6 | 40.2 | 13.2 | 46.8 | 70.3 |
| Llama3.1-8B | 59.5 | 60.8 | 73.4 | 82.0 | 56.7 | 77.1 | 65.6 | 45.6 | 18.7 | 56.5 | 73.9 |
| Llama3.2-3B + RAG-Instruct | 65.3 | 64.0 | 77.0 | 81.2 | 66.4 | 73.0 | 72.9 | 52.7 | 25.0 | 50.3 | 72.6 |
| Llama3.1-8B + RAG-Instruct | 69.7 | 68.4 | 79.3 | 84.8 | 77.2 | 79.9 | 79.3 | 56.4 | 30.3 | 57.8 | 77.0 |
For details, see our [paper](https://arxiv.org/abs/2501.00353) and [GitHub repository](https://github.com/FreedomIntelligence/RAG-Instruct).
## Citation
If you find our data useful, please consider citing our work!
```
@misc{liu2024raginstructboostingllmsdiverse,
title={RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented Instructions},
author={Wanlong Liu and Junying Chen and Ke Ji and Li Zhou and Wenyu Chen and Benyou Wang},
year={2024},
eprint={2501.00353},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.00353},
}
```
# 简介
RAG-Instruct是一款专为全面提升大语言模型(Large Language Model, LLM)检索增强生成(Retrieval-Augmented Generation, RAG)能力而设计的数据集,由GPT-4o合成生成。该数据集基于维基百科语料库,兼具查询-文档场景多样化与任务类型多样化的优势。
本数据集可显著强化大语言模型的检索增强生成能力,并在各类任务的检索增强生成性能上实现显著提升。
| 模型 | WQA (准确率) | PQA (准确率) | TQA (准确率) | OBQA (精确匹配率) | Pub (精确匹配率) | ARC (精确匹配率) | 2WIKI (准确率) | HotP (准确率) | MSQ (准确率) | CFQA (精确匹配率) | PubMed (精确匹配率) |
|--------------------------------|-----------|-----------|-----------|-----------|----------|----------|-------------|------------|-----------|-----------|-------------|
| Llama3.2-3B | 58.7 | 61.8 | 69.7 | 77.0 | 55.0 | 66.8 | 55.6 | 40.2 | 13.2 | 46.8 | 70.3 |
| Llama3.1-8B | 59.5 | 60.8 | 73.4 | 82.0 | 56.7 | 77.1 | 65.6 | 45.6 | 18.7 | 56.5 | 73.9 |
| Llama3.2-3B + RAG-Instruct | 65.3 | 64.0 | 77.0 | 81.2 | 66.4 | 73.0 | 72.9 | 52.7 | 25.0 | 50.3 | 72.6 |
| Llama3.1-8B + RAG-Instruct | 69.7 | 68.4 | 79.3 | 84.8 | 77.2 | 79.9 | 79.3 | 56.4 | 30.3 | 57.8 | 77.0 |
如需了解更多细节,请参阅我们的[论文](https://arxiv.org/abs/2501.00353)与[GitHub仓库](https://github.com/FreedomIntelligence/RAG-Instruct)。
## 引用
若您认为本数据集对您的研究有所帮助,请考虑引用我们的工作!
@misc{liu2024raginstructboostingllmsdiverse,
title={RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented Instructions},
author={Wanlong Liu and Junying Chen and Ke Ji and Li Zhou and Wenyu Chen and Benyou Wang},
year={2024},
eprint={2501.00353},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.00353},
}
提供机构:
maas
创建时间:
2025-01-20



