five

Geospatiality_data

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13941044
下载链接
链接失效反馈
官方服务:
资源简介:
This repository contains code and data for reproducing the study Geospatiality: The effect of topics on the presence of geolocation in English text data. The study analyzed the frequency of geolocations in texts across several distinct datasets from different sources. These sources were: Twitter (X) Reddit Stackexchange GDELT IA-Americana Nairaland For each source, a dataset was acquired and tested for the presence of geolocations in the texts, as well as annotated with topic-labels. The scripts use as inputs the data from the zip files in the data directory. Files need to be unzipped before running the scripts. Note that usernames have been anonymized. E_Modeling.R  Applies the mixed modeling approach described in the article. F1_Analyze_FracGeo.R  produces figures and tables visualising FracGeo, the fraction of geolocated text items per supertopic and dataset (Table 3 and Figure 3). F2_Explore_Variables.R analyses FracGeo, across timesteps, authors, and text length (Figure 4). F3_Analyze_Models.R analyses the fixed effects of the GLMM models for each dataset, and compares their correlation across datasets (Table 4, Figure 5, and Appendices A1-A6). F4_Validate.R compares the georeferences and supertopic assignments of the models to the human annotations (Appendix 9 and Table 5). The file topic_taxonomy.xlsx contains the topic taxonomy which matches topics to site-specific categories (e.g. subreddits, subforums, stackexchange sites). For users without access to MS office, the file can be loaded using open scripting languages, for example R: library(openxlsx2) path <- "../2_Data_Processing/Topic_taxonomy.xlsx" tax_reddit <- openxlsx2::wb_read(path, sheet = "Topic_Taxonomy_Reddit") tax_Stackexchange <- openxlsx2::wb_read(path, sheet = "Topic_Taxonomy_Stackexchange") tax_Nairaland <- openxlsx2::wb_read(path, sheet = "Topic_Taxonomy_Nairaland") tax_GDELT <- openxlsx2::wb_read(path, sheet = "Topic_Taxonomy_GDELT")
创建时间:
2025-04-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作