five

Twitter News Dataset

收藏
DataCite Commons2025-06-01 更新2024-07-25 收录
下载链接:
https://figshare.com/articles/dataset/tweets_csv_gz/3465974/2
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset consists on 5234 news events obtained from Twitter, along with the tweets talking about them. <br> The file tweets.csv.gz contains a CSV file, called tweets.csv, with all the tweets IDs corresponding to each event in events.csv. The format of each line of the file is the following: <br> tweet_ID, event_ID <b></b><br> Where: tweet_ID is an long number indicating the Twitter ID of the given tweet. Using the Twitter REST API it is possible to retrieve all the information about the given tweet. event_ID corresponds to the event ID of the given tweet. <br> The file events.csv.gz contains a CSV file, called events.csv with all the news events captured from Twitter since August, 2013 until June, 2014. The format of each line of the file is the following: <br> event_ID,date,total_keywords,total_tweets,keywords <br> Where: event_ID is an integer which identifies the corresponding event. There are 5234 events, then event_ID ranges from 1 to 5234. date is the date of the event or connected component. The format is YYYY-MM-DD. total_keywords is an integer indicating how many keywords are in the event or connected component. total_tweets is an integer indicating how many tweets belongs to this event. keywords is a string containing total keywords keywords. There is a semicolon between two keywords. <br> The files cluster_labels.txt and time_resolutions.txt contain the cluster labels for each event and the time resolutions learned from all events, respectively. cluster_labels.txt contains one integer number per line, from 0 to 19. In line <i>i</i>, the cluster label in that line corresponds to the event ID number <i>i</i>. time_resolutions.txt contains one floating point number per line, indicating the time resolution learned for all events, in minutes. There are 20 numbers in the file, one per line, in increasing order, with at most 13 decimal numbers after the point. <br>
提供机构:
figshare
创建时间:
2016-06-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作