Replication Data for: Extractive versus Generative Language Models for Political Conflict Text Classification

NIAID Data Ecosystem2026-05-10 收录

下载链接：

https://doi.org/10.7910/DVN/KDO5AM

下载链接

链接失效反馈

官方服务：

资源简介：

This replication package contains all data, code, and pre-computed model outputs necessary to reproduce the 9 tables and 4 figures in the manuscript, "Extractive versus Generative Language Models for Political Conflict Text Classification," forthcoming in Political Analysis. The study provides a comprehensive comparison between specialized, fine-tuned "extractive" models (e.g., ConfliBERT) and general-purpose "generative" large language models (LLMs) like Llama and Gemma. The models are evaluated on three core political text analysis tasks: binary conflict classification, multi-label event classification, and named entity recognition (NER). The package is fully automated via a master run.sh script and includes analysis code written in both R and Python. It is structured to support two modes of operation: Verification (Default): A fast run (under 2 minutes) that uses the provided pre-computed model outputs to generate all manuscript results. Full Recreation (Optional): A computationally expensive run (several hours) that reproduces all model predictions from the raw source data, requiring a local Ollama setup and compatible hardware. For complete instructions on setting up the computational environment and executing the scripts, please consult the README.md file included in this package.

创建时间：

2025-10-23

5,000+

优质数据集

54 个

任务类型

进入经典数据集