five

Replication Data for: PAPEA – A Modular Pipeline for the Automation of Protest Event Analysis

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://doi.org/10.7910/DVN/KVP7HA
下载链接
链接失效反馈
官方服务:
资源简介:
Protest event analysis is the core method to understand spatial patterns and temporal dynamics of protest. It has been used widely for the empirical analysis and theory building on social movements and contentious politics. However, the method is time- and resource intensive because it usually depends on manual annotation. Its application is thus mostly limited to selected national newspapers or newswires, often sampling newspaper issues or days to reduce the amount of data to annotate. Advances in Natural Language Processing (NLP) have provided Large Language Models (LLM) as powerful tools for identifying and classifying relevant text segments. As we will show in this paper, using these tools, the automated classification of protest events and of political event data more broadly can reach levels of accuracy comparable to humans, while reducing necessary annotation time by several orders of magnitude. We propose a modular pipeline for the automation of PEA based on various fine-tuned LLMs. Our pipeline uses publicly available models and tools and can thus be easily adapted and extended. With this pipeline, we get from newspaper articles to PEA datasets with high levels of precision without human intervention, preparing the ground for an almost real-time analysis of protest dynamics. We illustrate the potential of PAPEA with the use case of a large data set of German language local newspaper articles.
创建时间:
2025-07-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作