five

Replication Data for: Troubles in Text: Using Natural Language Processing to recognize government rationalizations for rights abuses

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://doi.org/10.7910/DVN/D5DXCE
下载链接
链接失效反馈
官方服务:
资源简介:
Do natural language processing (NLP) innovations in capturing words’ contextual meanings yield increased performance in computationally recognizing political concepts in text? We evaluate how one model (RoBERTa) performs on a uniquely challenging task: classifying government rationalizations for internment without trial within imperfectly digitized archive text. RoBERTa reliability outperforms conventional supervised methods at identifying and classifying internment rationalizations but remains inadequate for certain objectives. However, with proper model specifications and targeted manual interventions, RoBERTa serves as a reliable tool to reduce manual annotation. Our illustrative example demonstrates how researchers may combine NLP and manual annotation to analyze context-specific government policy discussions and/or poorly digitized historical records. RoBERTa and similar models would likely achieve even stronger performance on contemporary texts. This article demonstrates the value and limitations of using NLP to classify political concepts, discusses when applying such models could be beneficial, and offers practical instruction for NLP application in political science.
创建时间:
2024-07-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作