ERC Socsemics – EU19 - UK [UK perimeter] - 2023 - interactions
收藏DataCite Commons2025-06-27 更新2025-04-16 收录
下载链接:
https://nakala.fr/10.34847/nkl.0cb4qt8c
下载链接
链接失效反馈官方服务:
资源简介:
==Context==
The present dataset stems from a two-year data collection endeavor focused on a portion of Twitter users from what has been denoted over the course of Socsemics as the “EU19” perimeter. It is based on an initial data collection effort performed in 2019 of all tweets pertaining the 2019 EU parliamentary elections (based on a set of relevant hashtags), published between one month before and one month after the vote (April 26-June 28, 2019). The so-called UK [UK perimeter] within that dataset focuses on English language tweets published by a subset of 11889 users for whom a political ideology score was calculated, according to the method by Barberá (2015) and who are likely based in England.
==Dataset scope==
This dataset consists of interaction (retweet and reply) networks for popular hashtags that were consistently active during the twelve-month time period between September 2020 and August 2021. In total, 815,435 unique hashtags were used in the (English language) tweets published by the set of users in the twelve month period. The top 0.1% most frequently used hashtags, were filtered to those that were used at least 500 times in each month and hashtags related to news sites (#breaking, #newsnight, #bbcnews, #capitalreports, etc) were removed. Separate retweet and reply graphs were constructed for the entire time period, with user nodes being connected by a directed edge from user A to user B when – at any point in the twelve months – user A retweeted a tweet (containing the respective hashtag) published originally by user B. Finally, hashtag-tweet type pairs were discarded when the final edge list contained less than 100 edges, yielding a total of 99 hashtag retweet networks and 74 hashtag reply networks.
==Archive content==
All filenames are prefixed by “EU19-UK-2023-interactions-”, and edge lists end with “-ht.csv” for each hashtag for which such a file exists. All files are in CSV format. They have all been entirely anonymized and rely on four types of information: (1) anonymized user IDs (from 0 to 10429), “U-ID”, (2) date of the Monday of the week in which the interaction tweet(s) was/were published, (3) edge weight indicating the number of tweets of this type of interaction published between the pair of user nodes in the given week, (4) IP values as a real value whereby negative (resp. positive) values indicate left-wing (resp. left) leaning users.
• “-nodes.csv”, two columns: U-ID, IP value (real value whereby negative (resp. positive) values indicate left-wing (resp. left) leaning users); (1 file, 10 430 rows).
• “-edges_retweet-”, four columns: U-ID origin (retweeting), U-ID target (retweeted), edge weight, week; (99 files).
• “-edges_reply-”, four columns: U-ID origin (retweeting), U-ID target (retweeted), edge weight, week; (74 files).
==Acknowledgment of funding==
This dataset has been assembled in the framework of the ERC-supported Consolidator Grant “Socsemics” (research performed at CNRS, 2019-24), grant agreement #772743.
==References==
Pablo Barberá. Birds of the same feather tweet together: Bayesian ideal point estimation using Twitter data. Political analysis, 23(1):76–91, 2015. Publisher: Cambridge University Press
提供机构:
NAKALA - https://nakala.fr (Huma-Num - CNRS)
创建时间:
2024-10-30



