Replication Data for: Extractive versus Generative Language Models for Political Conflict Text Classification
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://doi.org/10.7910/DVN/KDO5AM
下载链接
链接失效反馈官方服务:
资源简介:
This replication package contains all data, code, and pre-computed model outputs necessary to reproduce the 9 tables and 4 figures in the manuscript, "Extractive versus Generative Language Models for Political Conflict Text Classification," forthcoming in Political Analysis. The study provides a comprehensive comparison between specialized, fine-tuned "extractive" models (e.g., ConfliBERT) and general-purpose "generative" large language models (LLMs) like Llama and Gemma. The models are evaluated on three core political text analysis tasks: binary conflict classification, multi-label event classification, and named entity recognition (NER). The package is fully automated via a master run.sh script and includes analysis code written in both R and Python. It is structured to support two modes of operation: Verification (Default): A fast run (under 2 minutes) that uses the provided pre-computed model outputs to generate all manuscript results. Full Recreation (Optional): A computationally expensive run (several hours) that reproduces all model predictions from the raw source data, requiring a local Ollama setup and compatible hardware. For complete instructions on setting up the computational environment and executing the scripts, please consult the README.md file included in this package.
创建时间:
2025-10-23



