公共政策语料包migu2023
收藏西部数据交易中心2024-12-03 更新2024-12-04 收录
下载链接:
https://westdex.com.cn/market/data/detail/8909
下载链接
链接失效反馈官方服务:
资源简介:
公共政策语料包-migu2023,是公共政策分析评估的中文专业语料包,聚合2020年至2023年全国、部分省市、部分区县、部分街镇正式发布的18964 个有效公共政策资料,形成常规语料包(6091.8万tokens分词、词性标注、语序逻辑),精选语料包(20.7682万tokens去重分词、聚类词性标注)。语料包年度更新,按照语料数据“采、洗、标、测、用”标准,依托公共政策分析专业研究人员清洗和标注及多重筛选,多维对齐情绪价值、文化价值、社会价值的最新标准要求,构建公共政策专用词汇的分级分类标注标准。
The Public Policy Corpus - migu2023 is a professional Chinese corpus for public policy analysis and evaluation. It aggregates 18,964 valid official public policy documents released from 2020 to 2023 at the national level, as well as some provincial, municipal, district/county, and sub-district/township administrative regions. Two corpus variants are developed: the Regular Corpus, which contains 60.918 million Tokens with word segmentation, part-of-speech tagging, and word order logic annotations; and the Refined Corpus, which contains 207,682 Tokens with deduplicated word segmentation and clustered part-of-speech tagging annotations. The corpus is updated annually. Following the standards of "collection, cleaning, annotation, validation, and application" for corpus data, it is cleaned, annotated, and multiple-filtered by professional researchers specializing in public policy analysis. It aligns with the latest standardized requirements for emotional value, cultural value, and social value across multiple dimensions, and establishes a hierarchical and classified annotation standard for specialized public policy vocabulary.
提供机构:
重庆迷殼科技有限公司
创建时间:
2024-12-03
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



