Manually Annotated Dataset Based on Core Socialist Values Indicator System , 2024-2025
收藏DataCite Commons2025-09-18 更新2026-05-04 收录
下载链接:
http://opendata.pku.edu.cn/citation?persistentId=doi:10.18170/DVN/VB2COB
下载链接
链接失效反馈官方服务:
资源简介:
1. Purpose of Data Collection (Production) The fundamental purpose of collecting (and producing) this dataset is to enable news and public opinion work to better adapt to the emerging platform society and the disruptive impact of AI large-language models across regions and sectors. The immediate goal of data collection is to train AI algorithms and develop a Core Socialist Values (CSV) recognition and authentication model. These two purposes are interdependent: achieving the immediate goal—building an CSV recognition tool—is a prerequisite for the fundamental goal. Only with this tool can we effectively leverage vast volumes of UGC and PGC content to improve both the quality and quantity of positive messaging, thereby enhancing the impact of positive content in the context of platform communication. In addition, this dataset is well-suited for AI model value alignment training and can support research on mainstream media content ecosystems, political communication, cross-media comparisons, and related social science studies. 2. Method of Data Collection (Production) First, our research team constructed a Core Socialist Values indicator system—consisting of 12 primary, 43 secondary, and 84 tertiary indicators across the “nation–society–citizen” dimensions—based on Marxist classics, the theoretical documents of the Communist Party of China and authoritative academic references, validated via archival literature review and cross-consistency assessments. Next, from the People's Daily Account, we collected textual data and assembled a data annotation team comprising Peking University students at undergraduate, master's, and doctoral levels specializing in social sciences. This team manually annotated the dataset according to the CSV indicator system, producing a labeled dataset aligned with Core Socialist Values Indicator System. 3. Data Coverage: Timeframe & Scope We gathered approximately 15,000 media texts (2015-2025) from People's Daily Account, annotated across 26 domains including Social Governance and Economic Policy. The temporal scope and coverage domain of the annotated dataset are identical to those of the originally collected data.
提供机构:
Peking University Open Research Data Platform
创建时间:
2025-08-11



