five

The Global Leader Personality Dataset (GLPD), 1946-2022

收藏
DataONE2026-02-04 更新2026-03-14 收录
下载链接:
https://search.dataone.org/view/sha256:0e29ff4131e93baebbed2d12a506a512d12f12c0a929cd7f5b499cf934d99457
下载链接
链接失效反馈
官方服务:
资源简介:
This data project generates the personality scores for over 10,000 global leaders spanning a period of 75 years (1946 – 2022). Speeches, organized by year, are bootstrapped 100 times at the sentence level, and are processed through psycholinguistic dictionaries -- LIWC and MRCPD. This step generates counts and proportions of word usage across all linguistic categories in the relevant dictionaries. These counts and proportions are plugged into a pre-trained Personality Recognizer Model (Mairesse et al. 2007) in line with similar previous work on US legislators (Ramey et al. 2017). The model returns the Big Five personality scores (openness, conscientiousness, extraversion, agreeableness, and emotional stability) via SMOreg (Support Vector Machine Regression) on a scale of 1 to 7, where the larger value indicates the speaker ranking higher on a relevant trait and vice versa. By analyzing thousands of speeches delivered at the United Nations General Debates (UNGD) to detect the personalities of global leaders, this project contributes to the growing body of first image research, particularly within the psychological approach in International Relations and its stream of Foreign Policy Analysis. Theoretically, it advances trait models in the psychological approach by building on the lexical theory of personality psychology to examine the traits of global elites. Decades of research have demonstrated that linguistic cues, such as utterances, not only convey semantic information but also reveal aspects of the speaker, including personality traits. Methodologically, to the best of my knowledge, this project represents the first attempt in International Relations to analyze the linguistic cues of global leaders using machine learning tools to assess their psychological characteristics based on the Big Five framework. With its stability across the adult lifespan and generalizability across domains and cultures, the Big Five provides a parsimonious tool to study the personality traits of global leaders, reducing concerns about context- and time-specificity that affect other established content-analytic at-a-distance methods, such as Leadership Trait Analysis (LTA). Additionally, this project and relevant works based on this dataset demonstrate that translated speeches can capture the nuances of native languages using existing psycholinguistic dictionaries, even without developing a language-specific coding scheme. Moreover, I show that the psycholinguistic properties of UNGD speeches are highly comparable to more spontaneous forms of text, such as media interviews, which are often considered to reflect a “truer” personality. These findings give scholars greater confidence in using widely available public statements for analysis, despite the scarcity of more spontaneous textual statements.

本数据项目为覆盖1946至2022年共75年时间跨度的逾1万名全球各国领导人生成人格得分。按年份整理的演讲文本在句子层面进行了100次自助抽样(bootstrapping),并通过心理语言学词典——语言探索与词计数(Linguistic Inquiry and Word Count,LIWC)与MRCPD进行处理。该步骤可生成相关词典中所有语言类别下的词汇使用频次与占比。随后将这些频次与占比输入至预训练的人格识别模型(Personality Recognizer Model,Mairesse等人,2007),该操作遵循此前针对美国国会议员的类似研究范式(Ramey等人,2017)。该模型通过支持向量机回归(Support Vector Machine Regression,SMOreg)以1至7的计分尺度输出大五人格(Big Five)得分,包括开放性、尽责性、外向性、宜人性与情绪稳定性,分值越高则表明发言者在对应特质上的水平越高,反之则越低。 本项目通过分析联合国大会一般性辩论(United Nations General Debates,UNGD)中数千份演讲文本以研判全球领导人的人格特征,为日益壮大的"第一形象"研究领域贡献了数据支撑,尤其在国际关系心理学路径及其下属的外交政策分析分支中具有重要价值。理论层面上,本项目基于人格心理学的词汇理论,通过对全球精英的特质展开分析,推进了国际关系心理学路径中的特质模型研究。数十年来的研究已证实,言语等语言线索不仅能够传递语义信息,还能揭示发言者的多维度特征,包括人格特质。 方法层面上,据笔者所知,本项目是国际关系领域首次尝试借助机器学习工具,基于大五人格框架对全球领导人的语言线索展开分析以评估其心理特征的研究。大五人格模型在成人生命周期中具有稳定性,且在不同领域与文化中具备可推广性,因此为全球领导人的人格特质研究提供了一种简洁高效的分析工具,同时缓解了其他成熟的远距离内容分析方法(如领导特质分析(Leadership Trait Analysis,LTA))所面临的情境与时间特异性问题。 此外,本项目及基于该数据集的相关研究表明,即便无需开发针对特定语言的编码方案,借助现有心理语言学词典,译制后的演讲文本仍可捕捉源语言的细微语义差异。此外,本研究证实,联合国大会一般性辩论演讲的心理语言学特征,与媒体访谈等更具自发性的文本形式高度相似——后者通常被认为更能反映发言者"更真实"的人格。这一发现让学者们更有信心利用广泛可得的公开声明开展分析,即便相较于更具自发性的文本资料,这类公开声明的数量相对稀缺。
创建时间:
2026-02-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作