Insight4news Irish news related hashtagged tweet collection 26.06.2015-24.05.2017
收藏Figshare2019-09-13 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/Insight4news_Irish_news_related_hashtagged_tweet_collection_26_06_2015-24_05_2017/9826274/1
下载链接
链接失效反馈官方服务:
资源简介:
<pre>The 1.5GB .tar.gz file contains a 4.3GB (uncompressed) .txt file with <b>225'559'593</b> rows, each row of which is a tweet ID.</pre><pre>These tweets have been collected in 26<i>.06.2015-24.05.2017</i> period with the <i>Hashtagger</i> platform (presented in https://doi.org/10.1145/2872427.2882982 by Shi et al.), which considered these tweets relevant to the monitored stream of news from Irish sources (The Irish Times, Irish Examiner, etc.). </pre><pre><br></pre><pre>All 225'559'593 tweets are in English (with 'en' in the 'lang' field of the json objects, privided by GNIP) and contain at least one hashtag. <br></pre><pre><br></pre><pre>Hydrate the tweet ids with Twarc (https://github.com/edsu/twarc) and write to a file. You will need to provide Twarc with a set of Twitter API keys.</pre><pre><i> twarc.py --hydrate tweet_ids.txt > tweets.json</i><br></pre><pre></pre><pre>It is probably not a good idea to hydrate all the tweets in one go, and may be better to split the file into chunks and hydrate the tweets chunk-by-chunk. </pre><pre><br></pre><pre><br></pre><pre>When using the dataset, please cite the following paper, for which this dataset was generated for. <b><br></b></pre><pre><b>Addressing Information Overload through Text Mining across News and Social Media Streams</b><br></pre><pre><i>Gevorg Poghosyan</i></pre><pre>SIdEWayS'19: 5th International Workshop on Social Media World Sensors, 2019</pre><pre>https://doi.org/10.1145/3345645.3351105<br></pre>
创建时间:
2019-09-13



