TWITTER CASCADE DATASET
收藏larc.smu.edu.sg2025-03-26 收录
下载链接:
https://larc.smu.edu.sg/twitter-cascade-dataset
下载链接
链接失效反馈官方服务:
资源简介:
This dataset comprises a set of information cascades generated by Singapore Twitter users. Here a cascade is defined as a set of tweets about the same topic.This dataset was collected via the Twitter REST and streaming APIs in the following way. Starting from popular seed users (i.e., users having many followers), we crawled their follow, retweet, and user mention links. We then added those followers/followees, retweet sources, and mentioned users who state Singapore in their profile location. With this, we have a total of 184,794 Twitter user accounts. Then tweets are crawled from these users from 1 April to 31 August 2012. In all, we got 32,479,134 tweets.To identify cascades, we extracted all the URL links and hashtags from the above tweets. And these URL links and hashtags are considered as the identities of cascades. In other words, all the tweets which contain the same URL link (or the same hashtag) represent a cascade. Mathematically, a cascade is represented as a set of user-timestamp pairs. Figure 1 provides an example, i.e. cascade C = {< u1, t1 >, < u2, t2 >, < u1, t3 >, < u3, t4 >, < u4, t5 >}.
本数据集由新加坡推特用户生成的一系列信息级联构成。在此,级联被定义为关于同一主题的推文集合。该数据集通过Twitter REST和流式API收集,具体方法如下:从拥有众多追随者的流行种子用户(即,拥有众多追随者的用户)开始,我们抓取了他们的关注、转发和用户提及链接。随后,我们添加了那些位于其个人资料地理位置中提及新加坡的追随者/被关注者、转发源和被提及用户。由此,我们共拥有184,794个Twitter用户账户。接着,从2012年4月1日至8月31日,我们从这些用户中抓取了推文。总计,我们获得了32,479,134条推文。为了识别级联,我们从上述推文中提取了所有URL链接和标签。这些URL链接和标签被视为级联的标识。换句话说,所有包含相同URL链接(或相同标签)的推文代表一个级联。从数学角度来看,级联被表示为一组用户-时间戳对。图1提供了一个示例,即级联C = {< u1, t1 >, < u2, t2 >, < u1, t3 >, < u3, t4 >, < u4, t5 >}。
提供机构:
Living Analytics Research Centre
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



