Resources for the paper: "Social Context in Political Stance Detection: Impact and Extrapolation"
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14207925
下载链接
链接失效反馈官方服务:
资源简介:
This repository contains the resources in our paper [Social Context in Political Stance Detection: Impact and Extrapolation]Ramon Villa-Cox, Evan Williams, Kathleen M. Carley
In this work, we explore the performance and extrapolation power of political stance-detection models using an existing large-scale weakly-labeled Twitter dataset collected around the 2019 South American Protests [1]. We construct transformer-based user and tweet encoders to embed users in a low-dimensional space using their text and ego-networks. We then train heterogeneous graph attention networks to predict user stances and contrast their ability to extrapolate stance predictions to different country contexts.
The protest dataset, which was collected between September 25 and December 24 of 2019, contains 550k labeled users split unevenly across the four countries and contains over 36 million labeled tweets. It contains an additional 1.1 million unlabeled neighbors and 40 million unlabeled tweets. This repository includes the anonymized datasets necessary to reproduce the results and tables of the paper. In addition, we include the corresponding anonymized resources for the new weakly-labeled dataset around the 2020 Chilean Referendum presented in our paper.
Following Twitter's January 2023 User Protection Policy update, tweet or user IDs related to sensitive political events cannot be publicly shared. We respect this policy, and only share:
The anonymized user ID, their weak-stance label, the label predicted by each model and the data split (train, validation or test) the user was assigned to.
Anonymized user network edges used by the different network classifiers
The type of tweet the edge represents (Original, Reply, or Quote)
The User Embeddings produced by the User Transformer and which serve as input for the different network models.
This repository is comprised of the following files:
Main_Predictions.7z: Compressed folder containing anonymized user IDs their stance label and each model’s prediction for the country it was trained on. The performance metrics for each model can be obtained based on the test split for each country. This folder includes the results for the Chilean Referendum.
Cross_Predictions.7z: Compressed folder containing the results of the cross-country experiments for each anonymized user. The performance metrics for each model, when applied on a different country can be obtained based on each complete file. Users seen during the training of each model are excluded as described in the paper.
Tweet_Level_Edgelists.7z: Compressed folder containing anonymized tweet edge lists indicating its interaction type (Original, Reply, or Quote).
User_Networks.7z: Compressed folder containing different anonymized user edge lists for each interaction type.
Embeddings.zip: Compressed Pytorch tensor files containing the User Embeddings produced by the User Transformer and which serve as input for the different network models. This are provided for the main results and the cross-country and referendum experiments.
The code developed for this study is available at: https://github.com/rvillaco/Protest_Stance_Detection
创建时间:
2024-11-23



