公共政策语料包migu2023
收藏上海数据交易所2024-11-26 更新2024-12-16 收录
下载链接:
https://nidts.chinadep.com/trading-market/product/detail?id=4997
下载链接
链接失效反馈官方服务:
资源简介:
公共政策语料包-migu2023,是公共政策分析评估的中文专业语料包,聚合2020-2023年全国及部分省市、区县、街镇正式发布的18964个有效公共政策资料,形成常规语料包(6091.8万tokens分词、词性标注、语序逻辑),精选语料包(20.7682万tokens去重分词、聚类词性标注)。经过公共政策分析专业研究人员多轮清洗和标注及筛选,为垂类模型训练提供公共政策文本分析的专业语料支持。
Public Policy Corpus - migu2023 is a professional Chinese corpus dedicated to public policy analysis and evaluation. It aggregates 18,964 valid official public policy documents issued between 2020 and 2023 by national authorities and relevant governments at provincial, municipal, district-county, and sub-district-town levels. The corpus includes two subsets: the Regular Corpus, which contains 60.918 million tokens after word segmentation, part-of-speech tagging and syntactic order logic processing; and the Curated Corpus, which contains 207,682 tokens after deduplicated word segmentation and clustered part-of-speech tagging. Having undergone multiple rounds of cleaning, annotation and screening by professional researchers specializing in public policy analysis, it provides professional corpus support for public policy text analysis in vertical model training.
提供机构:
重庆迷榖科技有限公司
创建时间:
2024-10-22
搜集汇总
数据集介绍

背景与挑战
背景概述
公共政策语料包migu2023是一个专业的中文公共政策分析语料包,包含2020-2023年全国及地方18964个政策资料,提供常规和精选两种语料包,适用于垂类模型训练。
以上内容由遇见数据集搜集并总结生成



