Japanese Sample Tweets, COVID-19 Keywords and Emotions from 2020-01-01 to 2020-06-30 (88,495,817 tweets and 47,539,139 retweets)
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/3972996
下载链接
链接失效反馈官方服务:
资源简介:
Data
Tweets_YYYY-MM.tsv.gz:
The first column is the tweet id, the second column is the date and time (JST) when the tweet was posted, the third column is the tweet id of the mention destination, the fourth column is the tweet id of the retweet source, the fifth column is the place id, the sixth column is the country code, the seventh column is the prefecture code if the country code is JP, and the eighth column is the COVID-19-related keyword included in the tweet. Columns with no information are empty. For example, a tweet with an empty eighth column is not a COVID-19-related tweet.
This data was collected using statuses/sample of the Twitter Streaming API, narrowed down by language=ja. Therefore, most of the tweets are Japanese tweets. Also, due to a failure of the data collection server, a large number of tweets on January 22 are missing :(
We have used 肺炎, コロナ and COVID (case insensitive) as keywords related to COVID-19.
Emotions_YYYY-MM.tsv.gz:
The first column is the tweet id, the second and subsequent columns are the number of occurrences of each emotional keyword. Column names (types of emotion) are shown in the first row.
We used mlask43-simple (Perl implementation of ML-Ask) with dictionaries used in pymlask to extract emotional keywords from the tweet.
Publication
This data set was created for my study. If you make use of this data set, please cite:
Mitsuo Yoshida. The State of Social Media During the COVID-19 Pandemic: Japan's Situation, Research Trends and Public Datasets. Journal of Japanese Society for Artificial Intelligence (in Japanese). vol.35, no.5, pp.644-653, 2020.
吉田光男. COVID-19流行下におけるソーシャルメディア ―日本での状況と研究動向・公開データセット―. 人工知能. vol.35, no.5, pp.644-653, 2020.
创建时间:
2020-09-01



