2021年中国台风灾害事件内容相关社交媒体数据
收藏国家对地观测科学数据中心2022-06-22 更新2024-03-04 收录
下载链接:
https://www.chinageoss.cn/datasharing/datasetDetails/62629e6c4984d37e565d5759
下载链接
链接失效反馈官方服务:
资源简介:
社交媒体数据集是大数据挖掘数据集,可以利用其研究跨篇章文本中模糊位置与空间关系知识挖掘方法,并进行空间位置的精确化表达。该数据通过网络爬虫技术,通过聚焦爬虫技术定向抓取互联网上对中国造成影响的台风相关的文本,并记录其发布时间、来源网站、URL等基础信息。通过人工筛查、交叉检验等方式进行数据清洗,去除重复信息、去除获取主要内容为空的数据,之后纠正错误数据,获得互联网台风原始数据。
This social media dataset is a big data mining dataset, which can be utilized to research knowledge mining methods for fuzzy positions and spatial relationships in cross-document texts, and realize the precise expression of spatial positions. This dataset adopts web crawler technology, more specifically focused crawler technology, to conduct targeted crawling of typhoon-related texts on the Internet that have impacted China, while recording basic metadata such as their release time, source website, and URL. Subsequently, data cleaning is performed through manual screening, cross-validation and other approaches: duplicate information is removed, data with empty main content is filtered out, and erroneous data is corrected, to obtain the original Internet-based typhoon dataset.
创建时间:
2022-06-22



