Climate Change Tweets Ids
收藏DataONE2019-05-20 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:9205e806a8af60256978f26a7f80e71b1b0e97a5d99895012adaf8145a0e4f1d
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains the tweet ids of 39,622,026 tweets related to climate change. They were collected between September 21, 2017 and May 17, 2019 from the Twitter API using Social Feed Manager. There is a gap in data collection between January 7, 2019 and April 17, 2019. Tweets were collected using the POST statuses/filter method of the Twitter Stream API, using the track parameter with the following keywords: #climatechange, #climatechangeisreal, #actonclimate, #globalwarming, #climatechangehoax, #climatedeniers, #climatechangeisfalse, #globalwarminghoax, #climatechangenotreal, climate change, global warming, climate hoax Because of the size of the collection, the list of identifiers is split into files of 10 million lines each, with a tweet identifier on each line. There is a README.txt file containing additional documentation on how the tweets were collected. The GET statuses/lookup method supports retrieving the complete tweet for a tweet id (known as hydrating). Tools such as Twarc or Hydrator can be used to hydrate tweets. Per Twitter’s Developer Policy, tweet ids may be publicly shared for academic purposes; tweets may not. Questions about this dataset can be sent to sfm@gwu.edu. George Washington University researchers should contact us for access to the tweets.
本数据集包含39,622,026条与气候变化相关的推文ID。数据采集工作于2017年9月21日至2019年5月17日期间开展,通过社交Feed管理器(Social Feed Manager)调用推特API(Twitter API)完成采集。2019年1月7日至2019年4月17日期间存在数据采集空档。
本次采集使用推特流式API(Twitter Stream API)的POST statuses/filter接口,通过track参数指定以下关键词:#climatechange、#climatechangeisreal、#actonclimate、#globalwarming、#climatechangehoax、#climatedeniers、#climatechangeisfalse、#globalwarminghoax、#climatechangenotreal,以及climate change、global warming、climate hoax。
由于采集规模庞大,标识符列表被拆分为若干分片文件,每个文件包含1000万行内容,每行存储一条推文ID。数据集附带README.txt文件,其中包含有关推文采集流程的补充说明文档。
通过推特API的GET statuses/lookup接口,可根据推文ID还原完整推文内容(该过程称为“水化(hydrating)”),可借助Twarc或Hydrator等工具完成推文水化操作。
根据推特开发者政策(Twitter’s Developer Policy),推文ID可出于学术目的公开共享,但完整推文内容不得公开。有关本数据集的相关疑问可发送邮件至sfm@gwu.edu。乔治华盛顿大学的研究人员若需获取完整推文内容,可与我方联系。
创建时间:
2023-11-22



