Youtu-Graph/AnonyRAG
收藏Hugging Face2025-08-31 更新2025-11-01 收录
下载链接:
https://hf-mirror.com/datasets/Youtu-Graph/AnonyRAG
下载链接
链接失效反馈官方服务:
资源简介:
AnnoyRAG数据集是一个用于问题回答任务的数据集,包含英文和中文两种语言。该数据集利用实体匿名化技术来隔离LLM的参数知识,以便更精确地评估LLM在RAG系统中整合检索信息的效果。数据集由四部经典小说《水浒传》、《红楼梦》、《白鲸记》和《米德尔马契》的原始文本构成,所有作品的版权均已进入公有领域。数据集分为问题和文本块两种格式,问题格式包括问题、答案、相关知识图关系和实体,文本块格式包括索引、标题和匿名化处理后的文本块。
The AnnoyRAG dataset is a question-answering dataset available in both English and Chinese. It uses entity anonymization to isolate LLMs parametric knowledge, allowing for more precise evaluation of how effectively LLMs integrate retrieved information in RAG systems. The dataset is composed of original texts from four classic novels: Water Margin, Dream of the Red Chamber, Moby-Dick, and Middlemarch, all of which are in the public domain. The dataset is structured into two formats: one for questions including input queries, answers, and related knowledge graph information, and another for text chunks that include indices, titles, and anonymized text segments.
提供机构:
Youtu-Graph



