Replication Data for: PAPEA – A Modular Pipeline for the Automation of Protest Event Analysis
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://doi.org/10.7910/DVN/KVP7HA
下载链接
链接失效反馈官方服务:
资源简介:
Protest event analysis is the core method to understand spatial patterns and temporal dynamics of protest. It has been used widely for the empirical analysis and theory building on social movements and contentious politics. However, the method is time- and resource intensive because it usually depends on manual annotation. Its application is thus mostly limited to selected national newspapers or newswires, often sampling newspaper issues or days to reduce the amount of data to annotate. Advances in Natural Language Processing (NLP) have provided Large Language Models (LLM) as powerful tools for identifying and classifying relevant text segments. As we will show in this paper, using these tools, the automated classification of protest events and of political event data more broadly can reach levels of accuracy comparable to humans, while reducing necessary annotation time by several orders of magnitude. We propose a modular pipeline for the automation of PEA based on various fine-tuned LLMs. Our pipeline uses publicly available models and tools and can thus be easily adapted and extended. With this pipeline, we get from newspaper articles to PEA datasets with high levels of precision without human intervention, preparing the ground for an almost real-time analysis of protest dynamics. We illustrate the potential of PAPEA with the use case of a large data set of German language local newspaper articles.
创建时间:
2025-07-10



