公共政策语料包migu2023
收藏上海数据交易所2025-01-10 更新2026-01-11 收录
下载链接:
https://nidts.chinadep.com/trading-market/product/detail?id=5626
下载链接
链接失效反馈官方服务:
资源简介:
公共政策语料包-migu2023,是公共政策分析评估的中文专业语料包,聚合历年全国及部分省市、区县、街镇正式发布的18964个有效公共政策资料,形成常规语料包(6091.8万tokens分词、词性标注、语序逻辑),精选语料包(20.7682万tokens去重分词、聚类词性标注)。经过公共政策分析专业研究人员多轮清洗和标注及筛选,为垂类模型训练提供公共政策文本分析的专业语料支持。
Public Policy Corpus - migu2023 is a professional Chinese corpus for public policy analysis and evaluation. It aggregates 18,964 valid official public policy documents released over the years by national authorities and some provincial, municipal, district, county, sub-district and town-level governments, creating the regular corpus (with 60.918 million tokens after word segmentation, part-of-speech tagging and word order logic annotation) and the curated corpus (with 207,682 tokens after deduplicated word segmentation and clustered part-of-speech tagging). This corpus has undergone multiple rounds of cleaning, annotation and screening by professional researchers in public policy analysis, providing professional corpus support for public policy text analysis in vertical model training.
提供机构:
重庆迷榖科技有限公司
创建时间:
2025-01-10
搜集汇总
数据集介绍

背景与挑战
背景概述
公共政策语料包migu2023是一个专注于公共政策分析的中文专业语料数据集,聚合了全国及地方各级发布的近1.9万个有效政策文档,形成常规和精选两个版本,分别包含数千万和数十万tokens的文本。该数据集经过多轮专业清洗和标注,具备分词、词性标注和语序逻辑处理,旨在为垂类AI模型训练提供高质量的公共政策文本分析支持。
以上内容由遇见数据集搜集并总结生成



