five

NLP and machine learning to measure peace from news media

收藏
DataCite Commons2026-03-14 更新2025-04-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.2v6wwpzv6
下载链接
链接失效反馈
官方服务:
资源简介:
“Hate speech” can mobilize violence and destruction.  What are the characteristics of “peace speech” that reflect and support the social processes that maintain peace?  In this study we used a data driven, machine learning approach to identify the words most associated with lower-peace versus higher-peace countries. Logistic regression and random forest classifiers were trained using five respected, traditional peace indices: Global Peace Index, Positive Peace Index, World Happiness Index, Fragile States Index, and Human Development Index. The feature inputs into the machine learning model were the word frequencies from the news media in each country and the output classifications were the level of peace in that country.  The machine learning model was successful in properly classifying the level of peace from the news media in a country (both accuracy and F1: 96% - 100%). We also used that trained machine model to create a machine learning peace index that measured the level of peace in countries, including countries not in the training set, which correlated with the average of those five traditional peace indices (r-squared = 0.8349). Using the random forest feature importance method we found that the words in news media in lower-peace countries were characterized by words related to government, order, control and fear (such as government, state, law, security and court), while higher-peace countries were characterized by an increased prevalence of words related to optimism for the future and fun (such as time, like, home, believe and game).
提供机构:
Dryad
创建时间:
2023-11-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作