POLITISKY24: U.S. Political Bluesky Dataset with Stance Labels
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14671772
下载链接
链接失效反馈官方服务:
资源简介:
POLITISKY24 (Political Stance Analysis on Bluesky for 2024) is a first-of-its-kind dataset for stance detection, focused on the 2024 U.S. presidential election. It designed for target-specific user-level stance detection and contains 16,044 user-target stance pairs centered on two key political figures, Kamala Harris and Donald Trump. In addition, this dataset includes detailed metadata, such as complete user posting histories and engagement graphs (likes, reposts, and quotes).
Stance labels were generated using a robust and evaluated pipeline that integrates state-of-the-art Information Retrieval (IR) techniques with Large Language Models (LLMs), offering confidence scores, reasoning explanations, and text spans for each label. With an LLM-assisted labeling accuracy of 81%, POLITISKY24 provides a rich resource for the target-specific stance detection task. This dataset enables the exploration of Bluesky platform, paving the way for deeper insights into political opinions and social discourse, and addressing gaps left by traditional datasets constrained by platform policies.
In the uploaded files:
The file 'Human_annotation_on_validation_users.csv' contains human-annotated stance labels for 445 validation users toward Trump and Harris, resulting in a total of 890 user-target pairs.The labels are divided into four stances: 1 (favor), 2 (against), 3 (neutral), and 4 (unrelated). To simplify the stance annotations provided by the large language model, the "neutral" and "unrelated" categories are combined and represented as "neither."
The file 'LLM_annotation_on_validation_users.json' contains stance labels annotated by a state-of-the-art LLM for 445 validation users toward Trump and Harris, resulting in a total of 890 user-target pairs. In addition to stance labels, each pair includes an explanation of the reasoning, the source tweets, spans from the source tweets used in the reasoning, and a confidence score.
The file 'LLM_annotation_on_dataset_users.json' is similar to 'LLM_annotation_on_validation_users.json but is generated for all dataset users excluding the validation set. It provides stance labels for 8,022 users toward Trump and Harris, totaling 16,044 user-target pairs.
The file 'Main_dataset_for_stance_detection.parquet' contains up to 1,000 recent English-language posts (including both original posts and reposts) from each of the 8,022 + 445 = 8,467 users. This file was used for the stance detection task.
The file 'Bluesky_dataset_on_us_politics.parquet' is similar to 'Main_dataset_for_stance_detection.parquet', but it contains all posts (including both original posts and reposts) from each of the 8,022 + 445 = 8,467 users.
The file 'Like_network.parquet' captures users' interactions through likes. Specifically, it contains the number of likes each user has given to original posts made by other users. It includes likes from 8,022 + 445 = 8,467 users, but it is not limited to interactions from these users alone.
The files 'Repost_network.parquet' and 'Quote_network.parquet' are similar to 'Like_network.parquet', but they capture users' interactions through reposts and quotes, respectively.
创建时间:
2025-01-18



