five

数字政府获奖案例敏感词脱敏数据

收藏
浙江省数据知识产权登记平台2025-09-24 更新2025-09-25 收录
下载链接:
https://www.zjip.org.cn/home/announce/trends/185335
下载链接
链接失效反馈
官方服务:
资源简介:
适用条件与范围:在政府内部进行政策前期研究、战略规划制定或数字化转型方案设计时,需要参考包含敏感信息的案例材料,且必须严格保护原始数据主体。 适用对象:政府政策研究人员、战略规划制定者、数字政府建设项目设计团队等。 解决的核心问题:直接使用含敏感信息的原始案例存在泄露风险,而完全屏蔽关键信息又会使数据失去分析价值。敏感词脱敏数据通过对原始案例中的关键敏感实体进行标准化、泛化或替换处理,有效解决了“数据可用性与安全性之间的矛盾”。 效果:1、提供安全分析基础: 脱敏后的数据允许研究人员在保障安全的前提下,深入分析案例中蕴含的行业趋势、成功模式、潜在风险及经验教训。 2、提升决策与规划质量:基于脱敏数据提炼的客观规律和实证经验,能为政策选项评估、中长期发展规划制定以及具体政府数字化转型项目的方案设计,提供更为坚实、科学的依据,有效降低决策的盲目性,提高政策与规划的可预期和落地效果。 3、促进知识安全复用: 保障政府积累的宝贵知识资产共享与再利用,加速政府数字化转型经验的沉淀与推广。从智慧中国年会官网的历年获奖案例材料中进行采集录入。按照预设规则建立敏感词库,对敏感词库中的词语根据所属数据字段进行分类,主要分案例名称类、案例概述类、特色亮点类以及案例附件地址类,确定敏感词库中每个词语所属的敏感数据类型。导入原始数据集,在敏感数据识别模型使用KNN算法将原始数据中的数据与敏感词库中的词语进行检索比对,在检索到该词语时,判断该词语是否是敏感数据,若是敏感数据则进行标记,敏感数据识别模型对待脱敏的原始数据中的每个词语进行脱敏。模型训练与优化:将更新的数据及敏感数据识别结果添加至原始数据集中,更新后的原始数据集作为部分敏感数据识别模型。例:原案例附件地址为[{"url":"http://60.163.157.162:31683/gds-data/20241234/滨江区-数智融合下的智慧治理.docx"}],包含了案例的文件地址,一旦泄露会造成公司资源流失,通过敏感数据识别模型对案例附件地址类信息进行标记并脱密,脱敏后附件地址为[{"url":"gds-data/20241234/滨江区-数智融合下的智慧治理.docx"}]

Applicability and Scope: When conducting pre-policy research, strategic planning formulation, or digital transformation program design within the government, it is necessary to reference case materials containing sensitive information while strictly protecting the original data subjects. Target Users: Government policy researchers, strategic planners, design teams for digital government construction projects, etc. Core Issues Addressed: Direct use of original cases containing sensitive information carries leakage risks, while completely shielding key information will render the data devoid of analytical value. Sensitive word desensitized data effectively resolves the "conflict between data availability and security" by standardizing, generalizing, or replacing key sensitive entities in original cases. Effects: 1. Provide a foundation for secure analysis: Desensitized data allows researchers to conduct in-depth analysis of industry trends, successful models, potential risks, and lessons learned contained in cases while ensuring data security. 2. Improve the quality of decision-making and planning: Objective laws and empirical experience extracted from desensitized data can provide a solid and scientific basis for policy option evaluation, medium- and long-term development plan formulation, and program design of specific government digital transformation projects, effectively reducing decision-making blindness and improving the predictability and implementation effect of policies and plans. 3. Promote secure reuse of knowledge: Ensure the sharing and reuse of valuable knowledge assets accumulated by the government, and accelerate the accumulation and dissemination of government digital transformation experiences. Data Collection: Collected and entered from annual award-winning case materials on the official website of the Smart China Annual Conference. Sensitive Word Lexicon Establishment: Establish a sensitive word lexicon according to preset rules, classify terms in the lexicon based on their corresponding data fields, mainly including case name, case overview, featured highlights, and case attachment address categories, and determine the sensitive data type for each term in the lexicon. Sensitive Data Recognition and Desensitization Process: Import the original dataset, use the KNN algorithm in the sensitive data recognition model to retrieve and compare data in the original dataset with terms in the sensitive word lexicon. When a matching term is retrieved, determine whether it is sensitive data; if so, mark it, and the sensitive data recognition model will desensitize every term in the original dataset requiring desensitization. Model Training and Optimization: Add updated data and sensitive data recognition results to the original dataset, and use the updated original dataset as part of the training corpus for the sensitive data recognition model. Example: The original case attachment address is [{"url":"http://60.163.157.162:31683/gds-data/20241234/滨江区-数智融合下的智慧治理.docx"}], which contains the file address of the case. Once leaked, it will cause the loss of relevant resources. The sensitive data recognition model marks and desensitizes the case attachment address category information. After desensitization, the attachment address is [{"url":"gds-data/20241234/滨江区-数智融合下的智慧治理.docx"}]
提供机构:
国脉互联数字发展(浙江自贸区)有限公司
创建时间:
2025-07-18
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集包含605条数字政府获奖案例的脱敏数据,每年更新,以xlsx格式存储,用于政府内部政策研究和数字化转型场景。它通过KNN算法对敏感词进行脱敏处理,有效平衡数据安全性和可用性,支持决策分析而不泄露原始敏感信息。
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务