five

A dataset of Gansu online media texts and neologisms for research on language life in ethnic areas

收藏
DataCite Commons2025-09-03 更新2025-04-16 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=04a2603db045409a90d8226c8098d5f6
下载链接
链接失效反馈
官方服务:
资源简介:
This study constructed a dataset of online media in Gansu Province from 2013 to 2022, with data from six major online media platforms in Linxia Hui Autonomous Prefecture and Gannan Tibetan Autonomous Prefecture, including Linxia Prefecture Government Website, Ethnic Daily, China Linxia Website, Shambhala Online, and China Gannan Website. The dataset covers a wide range of social, cultural, and linguistic aspects of the ethnic areas in Gansu, spanning a decade, and all the data are Chinese-language news reports and commentaries. Neologism extraction was carried out for each year's dataset, and the extracted neologisms were analyzed for their characteristics in terms of word frequency, lexicality, word number, cohesion, degrees of freedom, and neologism probability. The dataset was constructed with strict quality control measures, including manual proofreading, noise filtering, de-emphasis processing and language annotation, to ensure the accuracy and completeness of the data. This dataset is an important basic data for the study of language use, social and cultural dynamics and bilingual education development in ethnic areas, and has the value of being widely used in policy analysis, social opinion monitoring and language policy research.
提供机构:
Science Data Bank
创建时间:
2024-10-12
二维码
社区交流群
二维码
科研交流群
商业服务