DocAILab/FedE4RAG_Dataset
收藏Hugging Face2025-05-12 更新2025-11-29 收录
下载链接:
https://hf-mirror.com/datasets/DocAILab/FedE4RAG_Dataset
下载链接
链接失效反馈官方服务:
资源简介:
FedE4RAG_Dataset是为论文《Privacy-Preserving Federal Embedding Learning for Localized Retrieval-Augmented Generation》创建的数据集,旨在解决私有RAG系统中的数据稀缺和隐私挑战。该数据集使用联邦学习(FL)协作训练客户端RAG检索模型,保持原始数据的本地化。数据集包含训练数据和测试/验证数据,结构清晰,分别用于生成训练数据和下游问答测试。数据字段详细描述了每个字段的含义,包括上下文、索引、公司信息、问题、答案等。数据集的部分语料来源于开源数据集financebench。
FedE4RAG_Dataset is the dataset of the paper ***Privacy-Preserving Federal Embedding Learning for Localized Retrieval-Augmented Generation***. It addresses data scarcity and privacy challenges in private RAG systems. It uses federated learning (FL) to collaboratively train client-side RAG retrieval models, keeping raw data localized. The dataset contains training data and test/validation data, with clear structures for generating training data and downstream question & answer testing. Data fields are detailed, including context, index, company information, questions, answers, etc. Part of the corpus is derived from the open-source dataset financebench.
提供机构:
DocAILab



