Synthetic Documents
收藏arXiv2025-09-30 收录
下载链接:
https://huggingface.co/datasets/sarus-tech/medical_dirichlet_phi3
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是为了测试DP-RAG算法而合成的文档集合,确保语言模型在处理数据前不具备任何先验知识。该数据集特别设计用于促进隐私保护的文档检索,并且不包含任何真实个人数据。规模大约包含5000份文档,其任务是测试在文档检索与生成中的差分隐私性能。
This dataset is a synthesized document collection designed for testing the DP-RAG algorithm, ensuring that the language model has no prior knowledge before processing the data. It is specifically developed to facilitate privacy-preserving document retrieval and contains no real personal data. With approximately 5,000 documents in total, this dataset aims to evaluate the differential privacy performance in document retrieval and generation.
提供机构:
Sarus Tech



