Harvard CGA Geotweet Sentiment Archive
收藏Mendeley Data2024-03-27 更新2024-06-28 收录
下载链接:
https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/X2KJPC
下载链接
链接失效反馈官方服务:
资源简介:
Harvard CGA Geotweet Sentiment Archive is a subset of Harvard CGA Geotweet Archive v2.0 enriched with a sentiment score. It contains the tweet identification records along with a sentiment score based on tweet text for about 4.3 billion geo-tagged tweets since 2019. This sentiment score was calculated using Bidirectional Encoder Representations from Transformers. More information about this methodology can be found in our Nature Paper on Twitter Sentiment Geographical Index. This dataset is available to the academic community at large, unlike the Harvard CGA Geotweet Archive v2.0 which is under Twitter's redistribution policy restriction for public sharing. It could serve as cross-validation data for publications that used data from Harvard CGA Geotweet Archive v2.0 . If you are interested in accessing this archive, please fill out our Geotweet Request Form. Before requesting or receiving Tweet IDs, requestors must agree to Twitter's Terms of Service, Twitter's Privacy Policy, and Twitter's Developer Policy . Geotweets IDs data provided by CGA can only be used for not-for-profit research and academic purposes. Recipients may not share CGA provided Tweet IDs or content derived from them without written permission from the CGA. Citations: If you use the Geotweet Archive in your research please reference it: "Harvard CGA Geotweet IDs Archive". ======================================================== Schema of Geotweet Census Archive Field name____TYPE____Description message_id----TEXT----Tweet ID score ----FLOAT----BERT sentiment score
哈佛大学CGA地理推文情感档案库(Harvard CGA Geotweet Sentiment Archive)是哈佛大学CGA地理推文档案库v2.0(Harvard CGA Geotweet Archive v2.0)的子集,新增了情感评分标注。该数据集包含自2019年以来约43亿条带地理标签推文的推文标识记录,以及基于推文文本计算得到的情感评分。
该情感评分基于Transformer双向编码器表征(Bidirectional Encoder Representations from Transformers,即BERT)模型计算得到。有关该方法的更多细节,可参阅我们发表在《自然》(Nature)期刊上的《推特情感地理指数》研究论文。
与受推特(Twitter)再分发政策限制、无法公开共享的哈佛大学CGA地理推文档案库v2.0不同,本数据集面向全体学术群体开放获取。该数据集可作为使用哈佛大学CGA地理推文档案库v2.0数据的研究成果的交叉验证数据。
若有意获取该档案库,请填写我们的地理推文申请表格(Geotweet Request Form)。在申请或获取推文ID前,申请者须同意推特服务条款、推特隐私政策及推特开发者政策。
CGA提供的地理推文ID数据仅可用于非营利性研究与学术用途。未经CGA书面许可,获取者不得共享CGA提供的推文ID或由此衍生的内容。
引用说明:若您的研究中使用了本地理推文档案库,请引用如下:"Harvard CGA Geotweet IDs Archive"。
======================================================= 地理推文普查档案库(Geotweet Census Archive)字段说明
字段名称____数据类型____字段描述
message_id----TEXT----推文ID
score----FLOAT----BERT情感评分
创建时间:
2023-11-23
搜集汇总
数据集介绍

背景与挑战
背景概述
Harvard CGA Geotweet Sentiment Archive是一个专为学术研究设计的数据集,包含约43亿条自2019年以来的地理标记推文,每条推文都附有基于BERT模型计算的情感分数。该数据集的使用受到Twitter服务条款和CGA的严格限制,仅限非营利研究和学术用途。
以上内容由遇见数据集搜集并总结生成



