Spatial distributions of languages on Twitter
收藏DataCite Commons2022-08-25 更新2024-07-28 收录
下载链接:
https://figshare.com/articles/dataset/Spatial_distributions_of_languages_extracted_from_Twitter/14339321
下载链接
链接失效反馈官方服务:
资源简介:
This is a collection of GeoJSON files containing the counts of users of local language groups in every cell of a grid laid over several regions of interest. The cells are defined as squares in a projected system of coordinates adapted to each country, the sides of which have a size X specified in the file names (cell_size=Xm). <br><br>These counts were obtained through the processing of geo-located tweets posted between 2015 and 2019 in these regions, collected through the streaming API of Twitter, and more specifically using the "statuses/filter" endpoint (see Ref. 1). This endpoint provides a sample of tweets in real time matching some provided filters. Bounding box filters were set to collect tweets from a set of countries of interest. Before reproducing this method of data collection, one should bear in mind that the current form and even the availability of this endpoint is subject to future changes introduced by the Twitter Developer's team. The code used to make this processing as well as to visualize these data is available on GitHub (see Ref. 2).
本数据集为一系列GeoJSON(GeoJSON)文件,涵盖覆盖多个目标区域的网格中每个网格单元内的本地语言群体用户数量。该网格单元为采用适配各国的投影坐标系定义的正方形,其边长为文件名中标注的X(cell_size=Xm)。
上述用户数量统计值源于对2015年至2019年间上述区域内发布的带地理定位信息的推文的处理,相关数据通过Twitter流式API采集,具体使用了"statuses/filter"端点(参见参考文献1)。该端点可实时提供与指定筛选条件匹配的推文样本,研究人员通过设置边界框筛选条件,采集来自目标国家集合的推文。在复现该数据采集方法前,需注意:该端点的当前形态乃至其可用性,均可能因Twitter开发者团队后续的更新而发生变更。用于完成该数据处理与可视化的代码已上传至GitHub(参见参考文献2)。
提供机构:
figshare
创建时间:
2021-03-30



