The Weibo blogs dataset related to the keyword "land expropriation (征地)" (in China, from April 2011 to December 2021).
收藏Mendeley Data2024-06-25 更新2024-06-26 收录
下载链接:
https://data.mendeley.com/datasets/m5fmbd3v5d
下载链接
链接失效反馈官方服务:
资源简介:
1. Weibo Blogs and Time Information. We used a web crawler to extract 408,199 Weibo texts containing ‘land expropriation’ from August 4, 2011, to December 31, 2021. Subsequently, the Weibo texts were filtered based on the inclusion of the keyword "land expropriation" (mainly to exclude irrelevant posts from the crawled data), resulting in a total of 364,071 relevant data entries. 2. Weibo Geographic Information. The (chinese_province_city_area_mapper)[https://pypi.org/project/chinese_province_city_area_mapper/] tool extracted geographical information from Weibo texts, identifying provincial-level data from 249,023 texts, city-level data from 204,227 texts, and district-level data from 105,813 texts. Observations revealed that local media and opinion leaders often included their geographical location in their user nicknames. This allowed for extracting additional geographical information from user names, supplementing the Weibo texts lacking geographical details. This process resulted in 264,111 provincial-level, 223,722 city-level, and 106,361 district-level geographical entries. By utilizing the Baidu Maps API, we supplemented the latitude and longitude information based on the extracted geographical location information. Among them, the "similarity" field is the matching degree of the geographical location description returned by the Baidu Maps API. 3. For the protection of privacy, the original text fields (including blog text and nick_name) have been omitted. Decription of the data columns: - 'info_key': the index of this blog in the orginal text dataset. - 'zhengdi': =1, for all the filtered bolgs contains the keyword "land expropriation (征地)". - 'Province', 'City','adcode': the geo info abstracted from the text and nick_name - 'date','hour', 'min': the time tag of this Weibo blog. - 'location', 'Lon', 'Lat' : geo information returned by Baidu Map API. - 'similarity': the matching degree of the geographical location description returned by the Baidu Maps API. - 'Province_zh', 'City_zh','District_zh' : the simplified Chinese version of geo info.
创建时间:
2024-06-19



