Replication Data for: Troubles in Text: Using Natural Language Processing to recognize government rationalizations for rights abuses

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://doi.org/10.7910/DVN/D5DXCE

下载链接

链接失效反馈

官方服务：

资源简介：

Do natural language processing (NLP) innovations in capturing words’ contextual meanings yield increased performance in computationally recognizing political concepts in text? We evaluate how one model (RoBERTa) performs on a uniquely challenging task: classifying government rationalizations for internment without trial within imperfectly digitized archive text. RoBERTa reliability outperforms conventional supervised methods at identifying and classifying internment rationalizations but remains inadequate for certain objectives. However, with proper model specifications and targeted manual interventions, RoBERTa serves as a reliable tool to reduce manual annotation. Our illustrative example demonstrates how researchers may combine NLP and manual annotation to analyze context-specific government policy discussions and/or poorly digitized historical records. RoBERTa and similar models would likely achieve even stronger performance on contemporary texts. This article demonstrates the value and limitations of using NLP to classify political concepts, discusses when applying such models could be beneficial, and offers practical instruction for NLP application in political science.

创建时间：

2024-07-29

5,000+

优质数据集

54 个

任务类型

进入经典数据集