Frameworks: Infrastructure for Political and Social Event Data Using Machine Learning
收藏DataCite Commons2025-07-24 更新2025-09-08 收录
下载链接:
https://figshare.com/articles/dataset/Frameworks_Infrastructure_for_Political_and_Social_Event_Data_Using_Machine_Learning/29640821/1
下载链接
链接失效反馈官方服务:
资源简介:
The study of political violence is a central concern for researchers and analysts who have devoted significant resources to monitor, understand, and predict political violence at a global scale. Baded on prior NSF grants (OAC-1931541, SBE-SMA-1539302), this research develops and extends ConfliBERT, a domain-specific language model trained on an expert-curated corpus about conflict and political violence (Hu et al. 2022). ConfliBERT improves downstream tasks for conflict research while significantly alleviating human annotation efforts. We expand ConfliBERT to multilingual settings including Arabic and Spanish, update our corpora in sustainable ways, retrain ConfliBERT on a continuous basis, provide new political network data, and develop our language models for users to create customized datasets and applications. <br>
政治暴力研究是全球范围内投入大量资源开展监测、理解与预测工作的研究者与分析人员的核心关切方向。基于此前美国国家科学基金会(National Science Foundation, NSF)资助的两项项目(编号分别为OAC-1931541、SBE-SMA-1539302),本研究开发并拓展了ConfliBERT——一款基于冲突与政治暴力领域专家精选语料库训练的领域专用语言模型(domain-specific language model)(Hu等人,2022)。ConfliBERT可优化冲突研究的下游任务(downstream tasks),并大幅减轻人工标注的工作负担。我们将ConfliBERT拓展至包含阿拉伯语、西班牙语在内的多语言场景,以可持续方式更新语料库,持续对ConfliBERT进行重新训练,提供全新的政治网络数据集,并开发了可供用户创建定制化数据集与应用程序的语言模型工具。
提供机构:
figshare
创建时间:
2025-07-24



