five

Tibetan Amdo Dialect Region Government News Chinese Public Text Dataset

收藏
DataCite Commons2026-02-12 更新2026-05-05 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=478dc5caa3b0440784c3e36874b5dbc3
下载链接
链接失效反馈
官方服务:
资源简介:
1. In recent years, with the continuous development of Internet technology, the network platform has become an important territory of news dissemination, which has promoted the rapid increase of public text data, thus forming a large-scale digital public sphere. 2. Based on computer technology, this article crawls Mandarin news texts from 18 official government websites in the Anduo dialect area, and constructs a public text dataset of Mandarin government news in the Tibetan Anduo dialect area covering the years 2013 to 2024. This dataset contains a total of 260018 text data, which were processed to generate the following datasets: raw text dataset, main text dataset, and topic classification dataset. This dataset is different from traditional news datasets, focusing on ethnic regions and different industry classifications. It not only provides a solid corpus foundation for natural language processing and linguistic research, but also provides new empirical data support for exploring the practical path of Mandarin promotion in ethnic regions and strengthening the dissemination mechanism of Chinese national community consciousness. 3. Due to different storage time limits or uneven content distribution on different websites, some regions lack text content within the time period
提供机构:
Science Data Bank
创建时间:
2026-01-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作