AmazonScience/XRAG
收藏Hugging Face2025-08-11 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/AmazonScience/XRAG
下载链接
链接失效反馈官方服务:
资源简介:
XRAG是一个用于评估大型语言模型在跨语言检索生成任务中的生成能力的基准数据集。它包含两种不同的跨语言检索生成场景:一种是单语言检索,其中问题是非英语的,而检索到的文档是英语;另一种是多种语言检索,其中问题是非英语的,而检索到的文档包含英语和问题语言。当前版本涵盖四种非英语语言:阿拉伯语(ar)、中文(zh)、德语(de)和西班牙语(es),并提供了这些非英语问题的英语版本。数据集分为开发集和测试集,数据以JSON lines格式存储,每行是一个JSON字典,包含问题、答案、问题类型、答案类型、文章信息等键。
XRAG is a benchmark dataset for evaluating the generation capabilities of LLMs in cross-lingual retrieval-augmented generation tasks. It includes two different cross-lingual retrieval-augmented generation scenarios: one is monolingual retrieval where the questions are non-English while the retrieved documents are in English, and the other is multilingual retrieval where the questions are non-English and the retrieved documents contain both English and the question language. The current version covers four non-English languages: Arabic (ar), Chinese (zh), German (de), and Spanish (es), and provides English versions of these non-English questions. The dataset is divided into a development set and a test set, and the data is stored in JSON lines format, with each line being a JSON dictionary containing keys such as question, answer, question type, answer type, article information, etc.
提供机构:
AmazonScience



