公共政策语料包migu2023
收藏上海数据交易所2025-01-09 更新2026-03-21 收录
下载链接:
https://nidts.chinadep.com/reg-hall/product-detail?id=5626
下载链接
链接失效反馈官方服务:
资源简介:
公共政策语料包-migu2023,是公共政策分析评估的中文专业语料包,聚合历年全国及部分省市、区县、街镇正式发布的18964个有效公共政策资料,形成常规语料包(6091.8万tokens分词、词性标注、语序逻辑),精选语料包(20.7682万tokens去重分词、聚类词性标注)。经过公共政策分析专业研究人员多轮清洗和标注及筛选,为垂类模型训练提供公共政策文本分析的专业语料支持。
Public Policy Corpus - migu2023 is a professional Chinese corpus for public policy analysis and evaluation. It aggregates 18,964 valid public policy documents officially released by national authorities and some provincial, municipal, district, county, sub-district and town-level governments over the years. Two subsets are constructed: the Regular Corpus contains 60.918 million tokens with word segmentation, part-of-speech tagging and syntactic logic annotation completed; the Curated Corpus contains 207,682 tokens with deduplicated word segmentation and clustered part-of-speech tagging. This corpus has undergone multiple rounds of cleaning, annotation and screening by professional researchers in public policy analysis, providing professional corpus support for public policy text analysis in vertical model training.
提供机构:
重庆迷榖科技有限公司
创建时间:
2025-01-09
搜集汇总
数据集介绍

背景与挑战
背景概述
公共政策语料包migu2023是一个专注于公共政策分析的中文专业语料数据集,聚合了历年全国及部分省市、区县、街镇正式发布的18964个有效政策资料,覆盖从国家级到街道级的广泛范围。该数据集提供常规语料包(约6091.8万tokens)和精选语料包(约20.7682万tokens),均经过分词、词性标注等处理,并经过多轮专业清洗和标注,旨在为垂类模型训练提供高质量的公共政策文本分析语料支持。
以上内容由遇见数据集搜集并总结生成



