Mining the first 100 days: Human and data ethics in Twitter research
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.d2547d83h
下载链接
链接失效反馈官方服务:
资源简介:
This dataset consists of tweet identifiers for tweets harvested between November 28, 2016, following the election of Donald Trump through the end of the first 100 days of his administration. Data collection ended May 1, 2017.
Tweets were harvested using multiple methods described below. The total dataset consists of 218,273,152 tweets. Because of the different methods used to harvest tweets, there may be some duplication.
Methods
Data were harvested from the Twitter API using the following endpoints:
search
timeline
filter
Three tweet sets were harvested using the search endpoint, which returns tweets that include a specific search term, user mention, hashtag, etc. The table below provides the search term, data collection dates, the total number of tweets in the corresponding tweet set, and the total number of unique Twitter users represented.
Search term
Dates collected
Count tweets
Count unique users
@realDonaldTrump user mention
2016-11-28 - 2017-05-01
4,597,326
1,501,806
"Trump" in tweet text
2017-01-18 - 2017-05-01
11,055,772
2,648,849
#MAGA hashtag
2017-01-23 - 2017-05-01
1,169,897
236,033
Two tweet sets were harvested using the timeline endpoint, which returns tweets published by specific users. The table below provides the user whose timeline was harvested, data collection dates, the total number of tweets in the corresponding tweet set, and the total number of unique Twitter users represented. Note that in these cases, tweets were necessarily limited to the one unique user whose tweets were harvested.
User
Dates collected
Count tweets
Count unique users
realDonaldTrump
2016-12-21 - 2017-05-01
902
1
trumpRegrets
2017-01-15 - 2017-05-01
1,751
1
The largest tweet set was harvested using the filter endpoint, which allows for streaming data access in near real time. Requests made to this API can be filtered to include tweets that meet specific criteria. The table below provides the filters used, data collection dates, the total number of tweets in the corresponding tweet set, and the total number of unique Twitter users represented.
Filtering via the API uses a default "OR," so the tweets included in this set satisfied any of the filter terms.
The script used to harvest streaming data from the filter API was built using the Python `tweepy` library.
Filter terms
Dates collected
Count tweets
Count unique users
tweets by realDonaldTrump
tweet mentions @realDonaldTrump
'maga' in text
'trump' in text
'potus' in text
2017-01-26 - 2017-05-01
201,447,504
12,489,255
Harvested tweets, including all corresponding metadata, were stored in individual JSON files (one file per tweet).
Data Processing: Conversion to CSV format
Per the terms of Twitter's developer API, tweet datasets may be shared for academic research use. Sharing tweet data is limited to sharing the identifiers of tweets, which must be re-harvested to account for deletions and/or modifications of individual tweets. It is not permitted to share the originally harvested tweets in JSON format.
Tweet identifiers have been extracted from the JSON data and saved as plain text CSV files. The CSV files all have a single column:
id_str (string): A tweet identifier
The data include one tweet identifier per row.
创建时间:
2021-08-09



