five

ERC Socsemics – EU19 - FR [French perimeter] - 2020 - quotes

收藏
DataCite Commons2025-06-30 更新2025-04-16 收录
下载链接:
https://nakala.fr/10.34847/nkl.5a3d13l4
下载链接
链接失效反馈
官方服务:
资源简介:
==Main associated publication== Roth, C., St-Onge, J., Herms, K. (2022). Quoting is not Citing: Disentangling Affiliation and Interaction on Twitter. In: Benito, R.M., Cherifi, C., Cherifi, H., Moro, E., Rocha, L.M., Sales-Pardo, M. (eds) Complex Networks & Their Applications X. COMPLEX NETWORKS 2021. Studies in Computational Intelligence, vol 1072. Springer, Cham. https://arxiv.org/abs/2112.00554 ==Context== The present dataset stems from a two-year data collection endeavor focused on a portion of Twitter users from what has been denoted over the course of Socsemics as the “EU19” perimeter. It is based on an initial data collection effort performed in 2019 of all tweets pertaining the 2019 EU parliamentary elections (based on a set of relevant hashtags), published between one month before and one month after the vote (April 26-June 28, 2019). The so-called FR [French perimeter] within that dataset focuses on users active in French (i.e., publishing at least 15% of tweets in that language), having further published at least 5 tweets over this period (minimum activity) and being above the median number of 195 followers (minimum visibility), which reduced the number of users from 39,938 to 15,919, of which 14,102 were still active in January 2020, and 13,074 in December 2020; reflecting a relatively low attrition rate given the initial focus on 2019 elections. Dataset scope. This very dataset focuses only on quote trees, recursively made of quotes stemming from an initial root tweet, as well as the associated retweets; it is limited to content that stems from FR perimeter users over the year of 2020. It features additional crucial metadata in the form of Ideal Point (IP) estimates which gives an indication of the political leaning of each user. For more information please check the associated publication. ==Archive content== All filenames are prefixed by “EU19-FR-2020-quotetrees-”, all files are in CSV format. They have all been entirely anonymized and rely on four types of information: (1) anonymized user IDs (from 0 to 15919), prefixed by “U”, (2) anonymized tweet IDs (from 0 to 3720433), prefixed by “T”, (3) timestamps in the form “MMDD HH:MM” (the year being 2020 throughout), (4) IP values as a real value whereby negative (resp. positive) values indicate left-wing (resp. left) leaning users. • “-IPvalues.csv”, two columns: U-ID, IP value; 15919 users. • “-quotes.csv”, five columns: U-ID origin (quoting), U-ID target (quoted), T-ID origin (quoting tweet ID), T-ID target (quoted tweet ID), timestamp of the quote; 2 536 480 quote links. • “-retweets.csv”, three columns: U-ID origin (retweeting), U-ID target (retweeted), T-ID target (retweeted tweet); 20 591 553 retweet links. • “-timestamps.csv”, two columns: T-ID, timestamp of that tweet. ==Acknowledgment of funding== This dataset has been assembled in the framework of the ERC-supported Consolidator Grant “Socsemics” (research performed at CNRS, 2019-24), grant agreement #772743.
提供机构:
NAKALA - https://nakala.fr (Huma-Num - CNRS)
创建时间:
2024-10-30
二维码
社区交流群
二维码
科研交流群
商业服务