five

个人敏感信息按需脱敏工具集模拟数据集

收藏
国家基础学科公共科学数据中心2026-01-30 收录
下载链接:
https://nbsdc.cn/general/dataDetail?id=67d50f26195d260905af9a7c&type=1
下载链接
链接失效反馈
官方服务:
资源简介:
基于公安部《公安大数据处理 总体技术规范》(GA/DSJ 200-2019)生成数据,每条记录身份证号全局唯一,同时存在行号列作为个人信息记录索引,该行号全局自增且每条记录唯一,生成共计6000万人的个人信息数据,每条记录均对应一个人的详细信息,包含37个信息属性名,其中,编号1-32的个人信息属性数据模态存储为XlSX和CSV格式的表格,编号33-37的个人诚信说明书文档、用户图像、用户视频、用户音频和用户轨迹图形五种模态数据从公开第三方数据集中选取数据组成测试数据集或采用人工合成的方式产生数据集。

This dataset is developed in compliance with the General Technical Specification for Public Security Big Data Processing (GA/DSJ 200-2019) released by the Ministry of Public Security of the People's Republic of China. Each record features a globally unique resident identity card number, and a row number column is configured as the index for personal information records, where the row numbers are globally auto-incremented and unique across all records. A total of 60 million personal information records are generated, with each record corresponding to the detailed personal information of one individual, covering 37 information attributes. Among these attributes, the data of attributes numbered 1 to 32 are stored as tabular files in XLSX and CSV formats. For attributes numbered 33 to 37, the five modal datasets including personal integrity statement documents, user images, user videos, user audios, and user trajectory graphics are either selected from public third-party datasets to constitute a test dataset, or generated through artificial synthesis.
提供机构:
中国科学院信息工程研究所
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集基于公安部技术规范生成,包含6000万条具有唯一标识的个人信息记录,涵盖表格及多模态数据,用于隐私保护工具集的测试与模拟。数据总量为36.05GB,以XLSX、CSV和ZIP等格式存储,由中国科学院信息工程研究所发布。
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务