Tibetan Amdo Dialect Region Government News Chinese Public Text Dataset
收藏DataCite Commons2026-02-12 更新2026-05-05 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=478dc5caa3b0440784c3e36874b5dbc3
下载链接
链接失效反馈官方服务:
资源简介:
1. In recent years, with the continuous development of Internet technology, the network platform has become an important territory of news dissemination, which has promoted the rapid increase of public text data, thus forming a large-scale digital public sphere. 2. Based on computer technology, this article crawls Mandarin news texts from 18 official government websites in the Anduo dialect area, and constructs a public text dataset of Mandarin government news in the Tibetan Anduo dialect area covering the years 2013 to 2024. This dataset contains a total of 260018 text data, which were processed to generate the following datasets: raw text dataset, main text dataset, and topic classification dataset. This dataset is different from traditional news datasets, focusing on ethnic regions and different industry classifications. It not only provides a solid corpus foundation for natural language processing and linguistic research, but also provides new empirical data support for exploring the practical path of Mandarin promotion in ethnic regions and strengthening the dissemination mechanism of Chinese national community consciousness. 3. Due to different storage time limits or uneven content distribution on different websites, some regions lack text content within the time period
提供机构:
Science Data Bank
创建时间:
2026-01-05



