five

Mining the first 100 days: Human and data ethics in Twitter research

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.d2547d83h
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset consists of tweet identifiers for tweets harvested between November 28, 2016, following the election of Donald Trump through the end of the first 100 days of his administration. Data collection ended May 1, 2017. Tweets were harvested using multiple methods described below. The total dataset consists of 218,273,152 tweets. Because of the different methods used to harvest tweets, there may be some duplication. Methods Data were harvested from the Twitter API using the following endpoints: search timeline filter Three tweet sets were harvested using the search endpoint, which returns tweets that include a specific search term, user mention, hashtag, etc.  The table below provides the search term, data collection dates, the total number of tweets in the corresponding tweet set, and the total number of unique Twitter users represented. Search term   Dates collected Count tweets Count unique users @realDonaldTrump user mention 2016-11-28 - 2017-05-01 4,597,326  1,501,806 "Trump" in tweet text  2017-01-18 - 2017-05-01 11,055,772  2,648,849 #MAGA hashtag  2017-01-23 - 2017-05-01 1,169,897  236,033 Two tweet sets were harvested using the timeline endpoint, which returns tweets published by specific users. The table below provides the user whose timeline was harvested, data collection dates, the total number of tweets in the corresponding tweet set, and the total number of unique Twitter users represented. Note that in these cases, tweets were necessarily limited to the one unique user whose tweets were harvested. User Dates collected Count tweets Count unique users realDonaldTrump 2016-12-21 - 2017-05-01 902 1 trumpRegrets 2017-01-15 - 2017-05-01 1,751 1 The largest tweet set was harvested using the filter endpoint, which allows for streaming data access in near real time. Requests made to this API can be filtered to include tweets that meet specific criteria. The table below provides the filters used, data collection dates, the total number of tweets in the corresponding tweet set, and the total number of unique Twitter users represented. Filtering via the API uses a default "OR," so the tweets included in this set satisfied any of the filter terms. The script used to harvest streaming data from the filter API was built using the Python `tweepy` library. Filter terms Dates collected Count tweets Count unique users tweets by realDonaldTrump tweet mentions @realDonaldTrump 'maga' in text 'trump' in text 'potus' in text 2017-01-26 - 2017-05-01 201,447,504  12,489,255 Harvested tweets, including all corresponding metadata, were stored in individual JSON files (one file per tweet). Data Processing: Conversion to CSV format Per the terms of Twitter's developer API, tweet datasets may be shared for academic research use. Sharing tweet data is limited to sharing the identifiers of tweets, which must be re-harvested to account for deletions and/or modifications of individual tweets. It is not permitted to share the originally harvested tweets in JSON format. Tweet identifiers have been extracted from the JSON data and saved as plain text CSV files. The CSV files all have a single column: id_str (string): A tweet identifier The data include one tweet identifier per row.
创建时间:
2021-08-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作