隐私计算数据集

Name: 隐私计算数据集
Creator: 浙江大学
License: 暂无描述

国家基础学科公共科学数据中心2026-01-17 收录

下载链接：

https://nbsdc.cn/general/dataDetail?id=6967bdae195d26230e9b11b0&type=1

下载链接

链接失效反馈

官方服务：

资源简介：

本项目围绕金融反洗钱、电信反诈骗和网络内容风控三大社会风险防控场景，构建了用于隐私计算算子测试的“隐私计算数据集”。数据集在充分分析实际生产系统数据结构和跨机构协同业务流程的基础上，采用自动化脚本模拟生成，覆盖多机构、多账户、多行为类型等抽象对象，面向同态运算、聚合统计、集合运算和查询等共 15 种隐私计算算子提供统一测试数据。数据开发与实验时间范围为 2024 年 8 月至 2025 年 5 月，地点为杭州趣链科技有限公司；数据记录包含常规业务时间字段，采用日期与时间戳表示以支持顺序统计与窗口计算，空间范围以机构和账号等虚拟实体为基本单元，不直接涉及真实地理坐标。数据生成过程通过字段约束检查、主外键一致性校验、分布统计对比以及明文计算与隐私计算结果对照等多重质量控制，确保算子功能验证和性能评估的可靠性。该数据集兼具真实业务结构特征与合成数据的合规优势，可为多方隐私计算平台的功能测试、性能标定及算法研究提供可复现的基准环境，对推动跨机构数据可信共享与安全使用具有重要应用价值。

This project focuses on three major social risk prevention and control scenarios, namely financial anti-money laundering (AML), telecom anti-fraud, and online content risk control, and constructs a "Privacy-Preserving Computation Dataset" for testing privacy-preserving computation operators. Based on a comprehensive analysis of the data structures of actual production systems and cross-institution collaborative business processes, the dataset is generated via automated script simulation, covering abstract entities such as multiple institutions, multiple accounts, and various behavior types. It provides unified test data for a total of 15 privacy-preserving computation operators including homomorphic operations, aggregate statistics, set operations, and queries. The timeline for data development and experimental work spans from August 2024 to May 2025, with the work carried out at Hangzhou Hyperchain Technology Co., Ltd. Data records contain standard business time fields, which are represented by dates and timestamps to support sequential statistics and window calculations. The spatial scope takes virtual entities such as institutions and accounts as basic units, and does not directly involve real geographic coordinates. Multiple quality control measures are implemented throughout the data generation process, including field constraint checks, primary and foreign key consistency verification, distribution statistics comparison, and result comparison between plaintext computation and privacy-preserving computation, to ensure the reliability of operator function verification and performance evaluation. This dataset integrates the structural features of real business scenarios and the compliance benefits of synthetic data, and can provide a reproducible benchmark environment for function testing, performance calibration and algorithm research of multi-party privacy-preserving computation platforms. It holds significant application value for promoting trusted data sharing and secure utilization across institutions.

提供机构：

浙江大学

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集是浙江大学为隐私计算算子测试构建的专用数据集，聚焦金融反洗钱、电信反诈骗和网络内容风控三大风险防控场景，通过模拟生成覆盖15种算子的测试数据，兼具真实业务结构和合成数据合规性，旨在为多方隐私计算平台提供可复现的基准环境，支持功能验证和算法研究。

以上内容由遇见数据集搜集并总结生成